perm filename NOTES.JRA[LSP,JRA]20 blob
sn#242099 filedate 1976-10-18 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00055 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00006 00002 .SEC(Evaluation of LISP expressions,Evaluation,,P113:)
C00031 00003 .SS(S-expr translation of LISP expressions)
C00043 00004 .SS(Symbol tables,symbol tables,P92:)
C00052 00005
C00055 00006 .SS(%3λ%*-notation,%3λ%*-notation,P49:)
C00071 00007 .SS(Mechanization of evaluation,,P6:)
C00087 00008 .SS(Examples of %3eval%*,examples of %3eval%*)
C00092 00009 It should now be clear that %3eval%* and friends %2do%*
C00101 00010 .<<the | | symbol table >>
C00113 00011 .SS(Variables,variables,P135:)
C00128 00012 .SS(Environments and bindings,,P77:)
C00137 00013 .<<RED'S FACT>>
C00141 00014 Notice in this example that we will get the correct binding of %3x%* locally.
C00145 00015 .CENT(Problem)
C00147 00016 .SS(%3label%*,%3label%*,P38:)
C00155 00017 .SS(Functional Arguments and Functional Values,functional argument,P76:)
C00189 00018 .<<pic of funarg execution>>
C00202 00019 .SS(Binding strategies and implementations,binding strategy,P134:)
C00215 00020 .SS(Special Forms,Special Form,P8:)
C00222 00021 .SS(The %3prog%*-feature,,P37:)
C00242 00022 The ability to evaluate the argument to %3go%1 results in
C00249 00023 Now that assignment statements are out in the open let's re-examine "<=".
C00251 00024 .REQUIRE "NOTES.EVA" SOURCE_FILE
C00252 00025 .SS(Function definitions,,P202:)
C00263 00026 .SS(Rapprochement: In retrospect,,P90:)
C00280 00027 .ss(The LISP machine,,P175:)
C00296 00028 .SEC(Implementation of the Static structure of LISP,static structure,,P188:)
C00301 00029 .SS(Representation of S-expressions)
C00317 00030 .SS(Representation of LISP primitives,,P26:)
C00330 00031 .CENT(AMBIT/G)
C00335 00032 .SS(A few programming techniques,,P214:)
C00341 00033 .SS(Symbol tables and property-lists,,P128:)
C00362 00034 .SS(Property-list functions)
C00366 00035 .SS(An %3eval%1 for p-lists)
C00378 00036 .SS(Representation of property-lists)
C00384 00037 .GROUP
C00396 00038 With such print-names on each property-list
C00402 00039 .<<pic of NIL>>
C00407 00040 .SS(Input/Output: %3read%* and %3print%*,,P13:)
C00422 00041 .SS(Table searching: Hashing,hashing,P14:)
C00432 00042 .GROUP
C00450 00043 .SS(A first look at %3cons%1)
C00462 00044 .SS(Storage management: garbage collection,garbage collection,P146:)
C00469 00045 .CENT(A simple LISP garbage collector written in LISP)
C00479 00046 .CENT(Problems)
C00482 00047 .GROUP
C00488 00048 .SS(Implementations of binding,,P78:)
C00501 00049 .SS(stack implementation of deep binding,,P148:)
C00508 00050 .SS(stack implementation of shallow binding,,P225:)
C00528 00051 .CENT(An example of shallow binding and %3FUNARG%*)
C00533 00052
C00540 00053 .ss(Strategies for LISP implementation,,P228:)
C00547 00054 .SS(Epilogue)
C00552 00055 .END "JRA"
C00553 ENDMK
C⊗;
.SEC(Evaluation of LISP expressions,Evaluation,,P113:)
.BEGIN indent 20,20;
"... I always worked with programming languages because it seemed to me
that until you could understand those, you really couldn't understand
computers. Understanding them doesn't really mean only being able to use
them. A lot of people can use them without understanding them. ..."
.END
.BEGIN TURN ON "→";
→Christopher Strachey⊗↑[Str#74]↑
.END
.BEGIN "JRA" TURN ON "←","#";
.SS(Introduction)
%1
In the previous sections of this text we have talked
about some of the schemes for evaluation. We have done so rather
informally for LISP; we have been more precise about evaluation
of simple arithmetic expressions. {yonss(P2)} discusses this in
some detail.
We shall now look more closely at the informal
process which we have been using in the ⊗→evaluation↔← of LISP expressions.
This is motivated by at least two desires.
First, we want to run our LISP programs on a machine.
To do so requires the implementation of some kind of
translator to turn LISP programs into instructions which
can be carried out by a conventional machine.
We will be interested in the structure of such implementations.
Any implementation of LISP must be grounded on a precise,
and clear understanding of what LISP-evaluation
entails. Indeed, a deep understanding
of evaluation is a prerequisite for implementation of %2any%* language.
The question of evaluation cannot be sidestepped by basing a language
on a compiler. A compiler must produce code which when executed, simulates
the evaluation process. There is no escape.
Our second reason for pursuing evaluation involves the question
of programming language specification. This title covers a multitude
of sins.
At a practical level we want a clean, machine independent, "self-evident"
description of a language specification, so that the agony involved
in implementing the design can be minimized. At a more abstract level,
we should try to understand just what %2is%* specified when we design a
language. Are we specifying a single machine, a class of machines, a
class of mathematical functions. Just what is a language?
The syntactic specification of languages is reasonably well established,
but syntax is only the tip of the iceberg. Our study of LISP
will address itself to the deeper problems of semantics, or meaning,
of languages.
Before we address the direct question of LISP evaluation, we
should perhaps wonder aloud about the efficacy of studying languages in the
detail which we are proposing.
As computer scientists we should be curious about the
structure of programming languages because we
must understand our tools
--#our programming#languages. People who simply wish to
%2use%* computers as tools need not
care about the structure of languages. Indeed they usually
couldn't care less about the inner workings of the language; they only
want languages
in which they can state their problems in a reasonably
natural manner.
They want their programs to run and get results. They are interested in
the output and seldom are interested in the detailed process of
computation.
For a simple analogy, consider the field of mathematics.
The practicing mathematician uses his tools --#proofs#-- in a similar
manner to the person interested in computer applications.
He seldom needs to examine questions like "what is a
proof?" He does not analyze his tools.
However, it must be pointed out that not so many years ago
such questions indeed were raised, and for good reason.
Some common forms of reasoning were shown to lead to
contradictions unless care was taken.
Our position is more like that of the foundations of mathematics;
there the tools of mathematics %2are%* studied and held up to the
light. Mathematics has flourished because of it. Though our expectations
are not quite that presumptuous, we %2do%* expect that
programming language design cannot help but be improved.
Our study of language implementation will proceed from the abstract to
the concrete. Each level will intimately involve the study of data structures.
The current chapter will be the most abstract, building a precise
high-level description of an evaluation scheme for LISP. In fact,
the discussion is much more general than that of LISP; the text addresses
itself to problem areas in the design of any reasonably sophisticated
language. In subsequent chapters we probe beneath the surface of this
high-level description and discuss common ways of implementing
the necessary data structures.
But how can we begin to understand LISP evaluation?
In {yonss(P2)} we made a beginning, giving an algorithm for
a subset of the computations expressible in LISP.
This subset covered evaluation of some simple arithmetic expressions.
From our earliest grade school days we have had to evaluate simple
arithmetic expressions. Later, in algebra we managed to cope with
expressions involving function application. Most of us survived the
experience. We should now try to understand the processes we used in these
simple arithmetic cases, doing our examination at the most mechanical
level.
The basic intent of the algorithm is fixed: evaluate the expression; but within
that general constraint we often have several distinct alternative methods.
Those places at which we have choice should be remembered.
We will make reasonable choices so that
the process becomes deterministic and proceed. Later, we should
reflect on what effect our choices had on the resulting scheme.
For example, recall the discussion of the representation of symbol tables on
{yon(P186)}. We had several options, but picked one which seemed to
satisfy our intuitions and was reasonably efficient.
But we should subject that decision to close scrutiny: does it
really fulfill our expectations and is it efficient? In absence of
absolute standards, these questions are usually answered by examining the
behavior of the algorithm.
The first thing to note in examining simple arithmetic
examples is that %2nothing%* is really said about the process
of evaluation.
When asked to evaluate
%3(2*3)#+#(5*6)%* we never specified which summand was to be evaluated first.
Indeed it didn't matter here. %36#+#(5*6)%* or %3(2*3)#+#30%* both
end up %336%*.
Does it %2ever%* matter? "+" and "*" are examples of arithmetic functions;
can we always leave
the order of evaluation unspecified for arithmetic functions?
How about evaluation of arbitrary functional expressions?
If the order doesn't matter, then the specification of the evaluation
process becomes much simpler. If it %2does%* matter then we must
know why and where.
.P98:
We have seen that the order of evaluation %2can%* make a difference in LISP.
On {yon(P17)} we saw that order of
evaluation in conditional expressions can make a difference.
On {yon(P106)} we saw that %aCBV%1, LISP's computational
interpretation of function evaluation, requires some care.
Since we are using %aCBV%1 we must make %2some%* decision regarding
the order of evaluation of the arguments to a function call,
say %3f[t%41%*;t%42%*; ...t%4n%1]. We will assume that we will evaluate the arguments
from left to right.
Consider the example due to J. Morris:
.P84:
.BEGIN CENTERIT;
←%3f[x;y] <= [x = 0 → 0; %et%* → f[x-1;f[y-2;x]]]%1.
.END
Evaluation of %3f[2;1]%*
will terminate if we always evaluate the outermost occurrence of %3f%*.
Thus:
←%3f[2;1] = f[1;f[-1;2]] = f[0;f[f[-1;2]-2;1]] = 0;%*
However if we evaluate the innermost occurrences⊗↓The notions of "innermost"
and "outermost" evaluation need to be slightly embellished for general
function application. If the chosen application has several arguments, then
we must specify an order for their evaluation. Thus terms like "outermost-leftmost"
and "innermost-rightmost" occur. For example, the LISP scheme is an instance of
"innermost-leftmost" evaluation.←
first, the
computation will not terminate:
←%3f[2;1] = f[1;f[-1;2]] = f[1;f[-2;f[0;-1]]] = f[1;f[-2;0]] = ... .%*
So the choice of evaluation schemes is, indeed, fraught with peril.
The evaluation scheme, %aCBV%1, which we chose
is called %2⊗→call-by-value↔←%* and is an %2inside-out%* style of evaluation,
meaning that we operate on the subexpressions before tackling the main expression.
Alternative proposals exist
and have their merits; ⊗→call-by-name↔← evaluation is another
well-known scheme. We advertised this outside-in scheme on {yon(P126)} as %aCBN%*.
From a computational
perspective, we can live with call-by-value, though we know the use of such a rule
may lead to non-terminating computations when call-by-name would run to
completion.
Intuitively, call-by-value says:
evaluate the arguments to a function before you attempt to apply the
function definition to the arguments.
Let's look at a simple arithmetic example.
Let %3f[x;y]%* be %3x%82%*#+#y%1 and consider %3f[3+4;2*2]%*. Then
call-by-value says evaluate the arguments, getting %37%* and %34%*;
associate those values
with the formal parameters of %3f%* (i.e. %37%* with %3x%* and
%34%* with %3y%*) and then evaluate the body of
%3f%* resulting in #%37%82%*#+#4#=#53%1. This is the scheme we
captured in {yonss(P2)}.
Call-by-name says pass the %2unevaluated%* actual
parameters to the function, giving %3(3+4)%82%*#+#2*2%1
which also evaluates to %353%*.
Call-by-name is essentially a "substitution followed by simplification"
system of computation.
We will say more about call-by-name and other styles of
evaluation in {yonss(P187)}.
Most of this section will be restricted to call-by-value.
If you look at the structure of %3value%9''%1 and %3apply%9'%1 beginning
on {yon(P125)} you will see that they encode the %aCBV%* philosophy,
are recursive, and have an intended interpretation which
goes something like this:
%21.%* If the expression is a constant then the value
of the expression is that constant. (The value of %33%* is %33%*
⊗↓We are ignoring the distinction between the %2numeral%* %33%*
and the %2number%3 3.%1←).
%22.%* If the expression is a variable then see what the current value associated
with that variable is. Within the evaluation of, say,
%3f[3;4]%* where %3f[x;y]#<=#x%82%*#+#y %1the current value of the variable
%3x%* is %33%*.
%23.%* The only other kind of arithmetic expression that we can have is a function
name followed by arguments, for example %3f[3;4]%*.
In this case we first evaluate the arguments ⊗↓Here we are using the evaluation
process recursively.←
and then apply the definition of the function to those
evaluated arguments. How do we apply the function definition
to the evaluated arguments?
We associate or ⊗→bind↔← the formal parameters (or variables)
of the definition to the values of the actual parameters.
We then evaluate the body of the function in this new environment.
Notice that we do %2not%* explicitly substitute the values for the variables
which appear in an expression. We simulate substitutions by table lookup.
.P80:
A moment's reflection on the informal evaluation process
we use in LISP should convince us of the plausibility of
describing LISP evaluation at a similar level of precision.
First, if the LISP expression is a constant, then the value of the
expression is that constant.
What are the constants of LISP? They're just the S-exprs.
Thus
the value of %3(A#.#B)%* is %3(A#.#B)%*, just like the value of %33%* is %33%*.
Variables and functional applications appear in LISP and are handled as described
above in %22.%1 and %23.%1.
The additional artifact of LISP which we have to include in a discussion of
evaluation is the conditional expression. But clearly its evaluation can also
be precisely specified. We did so on {yon(P40)}.
So, in more specific detail, here is some of the structure of the
LISP evaluation mechanism:
%21.%* If the expression to be evaluated is a constant then the value
is that constant.
%22.%* If the expression is a variable find its value in the current environment.
%23.%* If the expression is a conditional expression then it is of the form
%3[p%41%* → e%41%*; p%42%* → e%42%*; ... p%4n%* → e%4n%1]. Evaluate it
using the semantics defined on {yon(P40)}.
%24.%* If the expression is of the form: %3f%1[%3t%41%3;t%42%3;#...#;%3t%4n%3]%1
then:
.BEGIN INDENT 10,10,10;GROUP;
%2a.%1 Evaluate the arguments %3t%41%3,#t%42%3,#...#,%3t%4n%1
from left to right.
%2b.%1 Find the definition of the function, %3f%1.
%2c.%1 Associate the evaluated arguments with the formal parameters
in the function definition.
%2d.%1 Evaluate the body of the function, while remembering the values
of the variables.
.END
We have seen ({yonss(P2)}) that we can transcribe
a simple kind of arithmetic evaluation into a recursive LISP program.
That program operates on a representation of the expression and
produces the value. Most of our work in that example was done
without giving explicit details of the representation.
However when we discussed the representation of simple differentiation
in {yonss(P41)} we showed a detailed representation.
We have demonstrated an informal, but reasonably precise, evaluation scheme
for LISP;
our discussion is ready for a more formal development.
It should be clear that we could write a LISP
function representing the evaluation process provided that we can
find a representation for LISP expressions as S-expressions.
This mapping, %eR%*, of LISP expressions to S-exprs is our first order of business. We
will accomplish this mapping by using an extension of the scheme introduced in
{yonss(p41)}.
We plan to expend some effort in describing a specific representation
for LISP expressions for two reasons. First, though abstraction
is a most desirable attribute, we must reconcile our abstraction
with reality; our programs must run. The second point is that
the representation we pick will have a very close relationship to the
way we present programs to the machine. So we will be careful and
thorough in describing the mapping and you should be conscientious
in your understanding of that mapping.
Once that representation is given we will produce a LISP
algorithm which describes the
evaluation process used in LISP.
This process of mapping LISP expressions onto S-exprs and writing a LISP
function to act as an evaluator may seem a bit incestuous;
indeed, the rationale for doing any of this may not be apparent.
Patience, please.
The mapping is no more obscure than that in the polynomial evaluation or
differentiation examples.
It is just another instance of the diagram of {yon(P46)},
only now we are applying the process to LISP itself. The effect is to force
us to make precise exactly what is meant by LISP evaluation. This
precision will have many important ramifications.
Also, we've been doing evaluation of S-expr representations of
LISP expressions already.
The %2great mother of all functions%* is exactly the evaluation mechanism for
the LISP primitive functions and predicates, %3car, cdr, cons, atom%* and %3eq%*
when restricted to functional composition and constant arguments.
The %2great mother revisited%* is the extension to conditional expressions.
In the next section we will give a typical mapping of LISP expressions
to elements of %d<sexpr>%*. But remember that we should attempt
to keep the knowledge of the representation out of the structure
of the algorithm.
Let's stop for some examples of translating LISP functions into S-expr notation.
.SS(S-expr translation of LISP expressions)
.BEGIN
We will go through the list of LISP constructs, describing the
effect of the representational map, %eR%*, and give a few examples
applying %eR%*.
.GROUP
%1We will represent numerals just as numerals, e.g.:
.BEGIN CENTERIT;
←%eR%f( %1<numeral> %f)%3 = %1<numeral>
←%eR%f( %32 %f)%3 = 2
.END
%1We will translate identifiers to their upper-case counterpart. Thus:
.BEGIN CENTERIT
←%eR%f( %1<identifier> %f)%3 = %1<literal atom>
←%eR%f( %3x %f)%3 = %3X
←%eR%f( %3y2 %f)%3 = %3Y2
←%eR%f( %3car %f)%3 = %3CAR
.END
.APART
.BEGIN FILL;
%1Now we've got a problem:
We wish to map arbitrary LISP expressions to S-expressions. The LISP expression
%3x%* translates to %3X%*. %3X%* is itself a LISP expression (a constant);
%2it%* must also have a translation.
We must be a little careful here.
When we write Son of Great Mother we will give it an S-expr representation
of a form to be evaluated. We might give it the representation of %3car[x]%*
in which case the value computed will depend on the current value bound to %3x%*.
We might also give the representation of %3car[X]%*; in this case we should
expect to be presented with %9B%1 or an error message.
Or, for example, some function %3foo%* we wish to write may return the S-expr
representation of a LISP form as its value.
Say %3foo[1]%* returns the representation of %3car[x]%* and %3foo[2]%* returns
the representation of %3car[X]%*. We must be able to distinguish between these
representations.
That is, given the representation,
there should be exactly one way of interpreting it as a LISP expression.
The mapping must be 1-1. So we must represent %3x%* and %3X%* as %2different%* S-exprs.
The translation scheme we pick is:
for any S-expression %9α%*, its translation is %3(⊗→%3QUOTE%*↔← %9α%3)%1.
.END
.GROUP
.SELECT 1;
.BEGIN CENTERIT
←%eR%f( %1<sexpr> %f)%3 = %3(QUOTE %1<sexpr>%3)%1
.END
For example:
.BEGIN CENTERIT
←%eR%f( %3X %f)%3 = %3(QUOTE X)%1
←%eR%f( %3(A .B) %f)%3 = %3(QUOTE (A . B))%1
←%eR%f( %3QUOTE %f)%3 = %3(QUOTE QUOTE)%1
.END
.APART
.GROUP
%1
.BEGIN FILL;
We must also show how to map expressions of the form %3f[e%41%*#;#...#;e%4n%*]%1
onto S-exprs.
We have already seen one satisfactory mapping for functions in prefix form
in {yonss(P41)}.
We will use that mapping, called Cambridge Polish⊗↓The name, Cambridge Polish,
is derived from two sources: Cambridge, since M.I.T. is in Cambridge Massachusetts,
and McCarthy was at M.I.T. while developing his ideas; Polish, since the representation
is a dialect of a notation developed by a school of Polish logicians.←, here. That is:
.END
.BEGIN TURN ON "←";
←%eR%f( %3f[e%41%*;e%42%*; ...;e%4n%*] %f)%3 =
( %eR%f( %3f %f)%3 %eR%f(%3 e%41%* %f)%3 %eR%f(%3 e%42%* %f)%3 ... %eR%f(%3 e%4n%* %f)%3 )%1
.END
.APART
.GROUP
%1Examples:%3
.BEGIN CENTERIT;
←%eR%f( %3car[x] %f)%3 = (%eR%f( %3car %f)%3 %eR%f( %3x %f)%3 ) = (CAR X)
←%eR%f( %3car[X] %f)%3 = (%eR%f( %3car %f)%3 %eR%f( %3X %f)%3 ) = (CAR (QUOTE X))
←%eR%f( %3cons[cdr[(A .B)];x] %f)%3 = (CONS (CDR (QUOTE (A . B))) X)
.END
.APART
.GROUP
%1The %eR%*-mapping must handle conditional expressions. A conditional is
represented as a list whose first element is %3COND%* and whose next %3n%*
elements are representations of the %3p%4i%*-e%4i%1 pairs. The %eR%*-map
of such pairs is a list of the %eR%*-maps of the two elements:
.BEGIN TURN ON "←";
←%eR%f( %3[p%41%* → e%41%*; ... ;p%4n%* → e%4n%*] %f)%3 =
(⊗→%3COND%*↔← %3(%eR%f(%3 p%41 %f)%3 %eR%f(%3 e%41%f)%3 ) ... (%eR%f(%3 p%4n%3 %f)%3 %eR%f(%3 e%4n %f)%3))
%1An example:%3
←%eR%f( %3[atom[x] →1; q[y] → X] %f)%3 = (COND ((ATOM X) 1) ((Q Y) (QUOTE X)))
.END
%1
.APART
.END
.P44:
Notice that %3(COND#...#)%* and %3(QUOTE#...#)%* %2look%* like translations
of function
applications of the form %3cond[#...#] %* and %3quote[#...#]%*.
However since we expect application to be performed using call-by-value,
we must handle these constructs in a special manner.
Indeed, note that %3quote[%9α%3]%1 stands for %eR%f(%9α%f)%1, and
that the "arguments" to %3cond%1 are not to be interpreted as some kind of
function applications. For example, %3COND#((ATOM#X)#1)#...)%1 doesn't
represent %3cond[#atom[x][1];#...#]%1.
Finally, the translations of the truth values %et%1 and %ef%1 will be
%3T%* and %3NIL%*, respectively.
.BEGIN CENTERIT;
←%eR%f( %et%* )%3 = T
←%eR%f( %ef%* )%3 = NIL
.END
You might have noticed that these last two applications of the %eR%*-mapping
have the potential to cause trouble. They will spoil the 1-1 property of %eR%*:
.BEGIN CENTERIT;
←%eR%f( %3t%* )%3 = T
←%eR%f( %3nil%* )%3 = NIL
.END
The usual way to escape from this difficulty is to outlaw %3t%* and
%3nil%* as LISP variables⊗↓In LISP#1.5 %3T%1 and %3F%1 were used as the representations of
%et%1 and %ef%1; the atoms %3T%1 and %3F%1 were (permanently) bound to values
%3*T*%1 and %3NIL%1.←.
Perhaps our concern for the %eR%*-mapping's properties appears heavy-handed
where a simple solution seems
apparent: %et%* is %et%* and %3t%* is %3t%*; when we
want the truth value we write %et%* and when we want the variable we
write %3t%*. The problem is that when we write programs in a format
which a machine version of LISP will understand, we will be writing
the %eR%*-image, rather than the LISP expression form. Thus to
ask a LISP machine to evaluate %3car[(A#.#B)]%* we present it
with %3(CAR#(QUOTE#(A#.#B)))%*. What this means is that we are presenting
our programs to the machine as data structures of the language.
It would be like expressing programs in Fortran or Algol as
arrays of integers; that is, the data structures of %2those%* languages.
We will explore the implications of this
approach to programming in later sections, but for now
it should help to know that we will be making extensive use
of the %eR%*-mapping.
You should go back and look at the %3tgm%*'s ({yonss(P39)}) now
that you know that they are evaluators for simple subsets of LISP expressions.
Note that the only atoms which the great
mothers recognize are %3T%* and %3NIL%*. Any other atoms
elicit an error message.
What do these other atoms represent? That is, of what objects are atoms
the %eR%*-maps? Well, numerals are atoms and are the %eR%*-maps of
numerals. We certainly could extend %3tgmoaf%* to handle this case.
Atoms are translations of another class of LISP expressions; they
are the translations of variables and function names.
So one task before us is to incorporate a mechanism into our simple LISP
evaluator which will handle evaluation of variables. We've already seen
the necessary mechanism in {yonss(P2)} where we studied tables as an abstract
data stucture. The other piece of LISP which did not appear in the evaluator
for polynomials was conditional expressions. Before handling conditionals
we wish to handle the informal intuitive discussion of tables in {yonss(P2)}
in a more precise manner.
.SS(Symbol tables,symbol tables,P92:)
%1
One of the distinguishing features of computer science is
its reliance on the ability to store and recover information.
Any formalism which addresses itself to computer science
must take this into account. In particular we must be able to
handle the effect of assignment or binding, that is, the association
of a value with a name in an environment. A common device used to accomplish this
is the symbol table. This is the device we used informally in {yonss(P2)}.
We will review some of that discussion here.
In essence, a symbol table is simply a set of ordered pairs
of objects; one of the
elements of each pair is a name; the other is its value.
This means that symbol tables can be characterized as relations or perhaps
even as functions.
This characterization is indeed viable. On {yon(P186)} we showed that
a table could be constructed and maintained in a manner preserving set-ness.
As an abstract operation, finding an element in a symbol table is also
quite simple: given a set of ordered pairs and a variable,
find a pair with first element the same as the given variable.
This operation can be described as function application where the function
being applied is the table and the argument is the name component.
That is: ###%3locate[x;tbl]#=#tbl(x)%1.
This level of abstraction was a bit too spartan; maintaining
such tables is computationally expensive, and we will %2have%1 to give
a %3locate%1 algorithm sooner or later.
The level of abstraction we envisioned looked on a
symbol table as a %2sequence%* of pairs, each pair representing a variable
and its corresponding value.
The algorithms given in {yonss(P2)} for manipulating tables depended
heavily on the implied sequencing of call-by-value and recursion.
Since this was consistent with the explicit sequencing used in adding
elements to the table, we achieved the desired effect. We found the
expected bindings, even though there may have been other candidates in the
tables. In the remaining sections of this chapter we will utilize more
features of this interplay between representation of data and calling style
of algorithm. Symbol tables are just the first manifestation of this
phenomenon.
These simple symbol tables are also known as ⊗→association lists↔←
or %2⊗→a-lists↔←%*;
thus %3assoc%* will be the name of our function to search a symbol table.
%3assoc%* will take two arguments: a name and a symbol table.
It will examine the table from left to right, looking for the first
match.
We will need to designate a selector, %3name%*, to locate
the name-component of a pair, and another selector, %3value%*, to
retrieve the value component.
If a pair is found,
then that pair is returned; if %2no%* such pair is found, the
result is undefined.
.BEGIN TABS 10,25;TURN ON "\";NOFILL;
.SELECT 3;
\assoc[x;l] <= \[eq[name[first[l]];x] → first[l];
\\ %et%* → assoc[x;rest[l]]]
.END
Obviously, if the table is very long and the desired pair is close to the
end of the table, then we may be in for a very long search. The search
scheme encoded in %3assoc%1 is called %2⊗→linear search↔←%1, and is
unnecessarily inefficient. However the phenomemona we wish to study here are
not related to efficiency of searching methods.
We will come back to symbol tables in {yonss(P128)} to study
the problems of efficient storage and retrieval of information.
It will suffice now simply to think of a symbol table as represented
in LISP by a list of dotted pairs, a name dotted with value.
In this representation, then, %3name[x] <= car[x]%*, and %3value[x]#<=#cdr[x]%*.
For completeness, we should also specify a constructor. Though we
won't need it for a while, its name is ⊗→%3mkent%*↔←, and
will take a name and a value and return a new entry. Its representation here is
%3mkent[x;y]#<=#cons[x;y]%*.
.GROUP
.P110:
Recall that we are representing variables as atoms; if %3x, y, %*and %3z%*
had current values %32, 3, %*and %34%*, then a symbol table describing that fact could
be encoded as:
←%3((X . 2) (Y . 3) (Z . 4)) .%1
.APART
.GROUP
.BEGIN CENTERIT;GROUP;
%1For example:
←%3assoc[Y; ((X . 2) (Y . 3) (Z . 4))] = (Y . 3)
←assoc[U; ((X . 2) (Y . 3) (Z . 4))] %1 = %9B%1.
.END
.APART
%1
How are we to represent bindings of variables to non-numeric S-exprs? For example,
how will we represent: "the current value of %3x%* is %3A%*"?
We will place the dotted-pair %3(X . A)%* in the table. Now this
representation is certainly open to question: why not add
%3(X .(QUOTE A))%*? The latter notation is more consistent
with our conception of representation espoused on {yon(P46)}.
That is, we map LISP expressions to S-expressions; perform the
calculations on this representation, and finally %2reinterpret%*
the result of this calculation as a LISP expression. The representation
we have chosen for symbol tables obviates the last reinterpretation
step. Now it will turn out that for our initial subsets of LISP
this reinterpretation step simply would involve "stripping" the %3QUOTE%*s.
The only "values" which a computation can return are constants; later
things will become more difficult. Perhaps this representation of
table entries is a poor one; we will see.
In studying any existing language, or contemplating the design of
any new one, we must question each detail of representation. Decisions
made too early can have serious consequences⊗↓However,
in defense of LISP, it must be remembered that
LISP was conceived at the time that FORTRAN was believed to be a
gift from God. Only later did we learn the true identity of the donor.←.
Before moving on we should probably take stock of our current
position; in this section
we have simply recreated the table-lookup mechanism we used
in {yonss(P2)}, only now we are paying a bit more attention
to representation.
We can locate things in a table and we have seen how calling
functions can add values to a table. We have said nothing
about adding function definitions to the tables.
Abstractly we know how to extract the definition from the table and
apply it. We must give an explicit representation of the storage of a
function. This turns out to be a reasonably non-trivial problem.
We have seen that it is possible to mechanize at least one
scheme for evaluation of functions -- call-by-value, evaluating arguments from
left to right.
We have seen that it is possible to translate LISP expressions
into S-exprs in such a way that we can write a LISP function
which will act as an evaluator for such translations. In the process
we have had to mechanize the intuitive devices we (as humans) might
use to recall the definition of functions and to recall the current
values of variables. It became clear that the mechanism
of symbol tables could be used. To associate a variable with
a value was easy. To associate a function name with its definition
required some care. That is, part of the definition of a function
involves the proper association of formal parameters with the
body of the definition. (We actually need to be a bit more
precise than this in describing recursive functions, but this is
good enough for now.) The device we chose is called the %2lambda
notation%*.
.SS(%3λ%*-notation,%3λ%*-notation,P49:)
.BEGIN TURN ON "←";
Recall our discussion of the problems of representation of function
definitions. This discussion began on {yon(P124)} and our conclusion was
that to represent a definition like %3f[x;y]#<=#%9x%1
we needed a symbol table entry with name %3f%* and a value part which
contained the body of the definition, %9x%1, and the
list of arguments, %3[x;y]%*, given with %3f%*.
LISP uses the λ-notation to lend precision to our informal
discussion of function representation.
Why add more notation to LISP?
The λ-notation is a device invented by the logician Alonzo Church in
conjunction with his studies in logic and foundations of mathematics.
The λ-calculus is useful for a careful discussion of functions
and is therefore applicable in a purified discussion of procedures in programming
languages. We shall begin a detailed discussion of the
λ-calculus and its relation to computer science in {yonss(P85)}.
The notation was introduced into Computer Science by John McCarthy in
the description of LISP.
In the interest of precision we should note that there are actually several
important distinctions between Church's λ-calculus and the notation envisioned
by McCarthy. We will point out the discrepancies in {yonss(P85)}.
We have been informally writing %3f[x;y]#<=#[x*y#+#y]%* as
a definition of the function %3f%*. It is supposed to convey the following
intent: %3f%1 is the name of a function or rule; whenever %3f%1 is supplied
with two numeric arguments it is supposed to multiply those arguments
and add the result to the second. The resulting sum is the desired answer.
Since informality has tendencies toward ambiguity we would like to
analyze the "<="-notation more closely. Though we say %3f%1 is being
defined, it is not %3f%1, but %3f[x;y]%1 which appears to the left of the
"<="-symbol.
First note that %3f[x;y]%* is %2not%* a function, %3f%* is.
To see what %3f[x;y]%* means consider the following example.
When we are asked to evaluate %3car[(A#.#B)]%* we say the value is %3A%*.
%3car[(A#.#B)]%* is an expression to be evaluated; it is a LISP ⊗→form↔←.
If %3car[(A#.#B)]%* %2is%* a form then so is %3car[x]%*;
only now the value of the
form depends on the current value assigned to the variable %3x%*.
So the %2⊗→function↔←%* is %3car%*; the %2form%* is %3car[x]%*.
The function is %3f%*; %3f[x;y]%* is a form, as is %3x*y#+#y%*.
Thus the current notation has a form on both sides of the "<=".
We would like a notation which clearly shows what is being defined and
what is given.
Also our notation has really been specifying more than just the name.
The notation specifies the formal parameters (%3x%* and %3y%*) and
the order in which we are to associate actual parameters in a call
with the formal parameters of the definition (%3x%* with the first, %3y%* with the second).
More subtly, the notation tells %2which%* variables in the function body
are to be supplied values when the
function is called. For example define %3g[x]#<=#[x*y#+#y]%*; then the
expression %3g[2]%* specifies that %3x%1 is to receive a value %32%1, but
leaves unspecified what the value of %3y%1, and indeed + and *, should be.
We also wish to have a notation
so that function definitions can be inserted into the symbol
table as "values" assigned to names.
They will be parametric values, but they will be values.
The λ-notation performs this task by preceding the function body with
a list of variables, called %2⊗→lambda variables↔←%*⊗↓The
list of variables is usually referred to as the %2lambda list%*.←;
The resulting construct is preceded by "λ[" and followed by "]".
Thus using the above example, the name %3f%* is exactly the
same function as %3λ[[x;y]#x*y#+#y]%*.
We actually need a bit more than
λ-notation to specify recursive functions in a natural manner. See {yonss(P38)}.
The λ-notation introduces nothing new as far as our intuitive binding and
evaluation processes are concerned; it only makes these operations more
clear.
One benefit of the
λ-notation is that we need not give explicit names to functions in order to
perform the evaluation. Evaluation of such anonymous functions should be
within the province of LISP.
To evaluate a λ-expression in LISP, bind the evaluated actual parameters
to the λ-variables and evaluate the function body.
For example, evaluate:
←%3λ[[x;y] x%82%* + y][2;3] %1.
We associate %32%* with %3x%* and %33%* with %3y%* and evaluate the expression:
←%3x%82%* + y%1.
This calculation should give %37%* as value.
Or evaluate: ← %3λ[[x] cdr[car[x]]][((A . B). C)]%*.
We bind %3x%* to the
S-expression %3((A#.#B).#C)%* and evaluate the function body. The usual
LISP evaluation scheme entails evaluating %3car[x]%* with the current
binding of %3x%*; this result, %3(A#.#B)%*, is passed to %3cdr%*.
%3cdr%* performs its calculation and finally returns %3B%*.
The λ-notation can be used anywhere LISP expects to find a function, thus
allowing us to write anonymous functions.
For example:
←%3λ[[x] first[x]][λ[[y] rest[y]][(A B)]]%1
This expression is a complicated way of writing:
←%3f[g[(A B)]]%* where %3f[x] <= first[x] %*and %3g[y] <= rest[y]%*.
Though the second form is perhaps easier for us to comprehend, the
first form %2is%* equivalent and will be acceptable to the evaluator we will
write. Indeed the mechanical evaluation of the second formulation will pass
through the first on its way to complete evaluation. At any rate:
←%3λ[[x] first[x]][λ[[y] rest[y]][(A B)]] = λ[[x] first[x]][(B)] = B%1
Still, evaluation requires
care. For example, is %3λ[[x]2]%* the constant function which always
gives value %32%*? It isn't in LISP.
The evaluation of an expression involving
this function requires the evaluation of the actual parameter associated with %3x%*.
That computation may not terminate. For example, consider %3λ[[x]2][fact[-1]]%*
where %3fact%* is the LISP implementation of the factorial function given on
{yon(P20)}.
Since we intend to include λ-expressions in our language we must include
an %eR%*-mapping of them into S-expression form:
.BEGIN GROUP;centerit;
←%eR%f( %3λ[[x%41%*; ...; x%4n%*]%9x%*] %f)%3 = (LAMBDA (X%41%* ... X%4n%*) %eR%f( %9x%3 %f)%3)%1
.END
Thus the character λ will be translated to %3LAMBDA%*. %3LAMBDA%* is therefore a
kind of prefix operator (like %3COND%*) which will help the evaluator recognize
the occurrence of a function definition (just as %3COND%* will be used by the
evaluator to recognize the occurrence of a translation of a conditional
expression).
Here are some examples of %3λ%*-expressions and their %eR%*-translations:
.BEGIN CENTERIT;GROUP;
←%eR%f( %3λ[[x;y] x%82%* + y] %f)%3 = (LAMBDA (X Y) (PLUS (EXPT X 2) Y))
←%eR%f( %3λ[[x;y] cons[car[x];y]] %f)%3 = (LAMBDA (X Y) (CONS (CAR X) Y))
.END
To make our discussion of λ-expressions completely legitimate,
our LISP syntax equations should now be augmented to include:
.BEGIN TABIT1(11);GROUP;
<function>\::= λ[<varlist><form>]
<varlist>\::= [<variable>; ... ; <variable>]
\::= [ ]
.END
Besides giving a clear notation for function definitions, the λ-notation
is a useful computational device.
Consider the following sketch of a function definition:
←%3g <= λ[[x][%9p%*[lic[x]] → lic[x]; .... x ...]]%1,
where %3lic%* may be a %dl%*ong %di%*nvolved %dc%*alculation.
We certainly must compute %3lic[x]%* %2once%*. But as %3g%* is defined,
we would compute %3lic[x]%* %2twice%* if p%41%* is true: once
in the calculation of p%41%*, and once as e%41%*. Since
both calculations of %3lic[x]%* will give the same value⊗↓Our
current LISP subset has no side effects. That means
there is no way for a computation to affect its surrounding
environment. The most common construct which has a side-effect
is the assignment statement.←, this second calculation
is unnecessary. %3g%* is inefficient. We could write:
←%3g <= λ[[x]f[lic[x];x]]%1
where:
←%3f <= λ[[u;v][%9p%*[u] → u; .... v ...]]%1.
.P170:
In this scheme %3lic%* %2will%* only be evaluated once; its value
will be passed into %3f%*.
Using λ-expressions, in a style
called %2⊗→internal lambdas↔←%* we can improve %3g%* without
adding any new function names to our symbol tables as follows:
Replace the body of %3g%* with:
%aLAM← %3λ[[y][%9p%*[y] → y; ... x ...]][lic[x]]%1.
Call this new function %3g'%*:
.BEGIN TABIT2(10,18);GROUP;
\%3g' <= λ[[x]#λ[[y][%9p%*[y] → y; ... x ... ]][lic[x]]#] %1.
.END
Now when %3g'%* is called we evaluate the actual parameter, binding it
to %3x%*, as always; and evaluate the body, %aLAM%*. Evaluation of %aLAM%*
involves calculation of %3lic[x]%* %2once%*, binding the result to
%3y%*. We then evaluate the body of the conditional expression as before.
If p%41%* %2is%* true, then this definition of %3g'%* involves
one calculation of %3lic[x]%* and two table look-ups (for the value of %3y%*),
rather than the two calculations of %3lic[x]%* in %3g%*.
More conventional programming languages can obtain the same effect as
this use of internal lambdas by assignment of %3lic[x]%* to a
temporary variable. We will introduce assignment statements in LISP in
{yonss(P37)}.
.END
.CENT(Problems);
I. What is the difference between %3λ[[ ] x*y + y]%* and %3x*y + y%*?
.SS(Mechanization of evaluation,,P6:)
%1
We now have picked a representation for LISP expressions; have
introduced a precise notation for discussing functions; and we
have given plausibility arguments for the existence of an evaluator for
LISP. It is now time to write a formal, precise, evaluator for LISP expressions.
The evaluator will be the final arbiter on the question of the
meaning of a LISP construct. The evaluator is thus a very important
algorithm. We will express it and its related functions in as
representation-free form as possible, but we will always have
our Cambridge polish
representation in the back of our minds.
As we have said, %3tgmoaf%* and %3tgmoafr%* ({yonss(P39)}) are evaluators
for subsets of LISP.
Armed with our symbol-table mechanism we could now extend the
%2great-mothers%* to handle variable look-ups.
Rather than do this we will display our burgeoning cleverness
and make a total revision of the structure of the
evaluators. In making the revision, the
following points should be remembered:
1. Expressions to be evaluated can contain variables, both simple
variables and variables naming λ-expressions. Thus evaluation must be done
with respect to an environment or symbol table. We wish to recognize other
(translations of) function names besides %3CAR, CDR, CONS, EQ,%* and%3 ATOM%*
in our evaluator, but
explicitly adding new definitions to the evaluator in the style of the
recognizers for the five primitives is a bad idea. Our symbol table should
hold the function definitions and the evaluator should contain the general
schemes for finding the definitions, binding variables to values, and
evaluating the function body. By now you should be convinced that this
process is a reasonably well defined task and could be written in LISP.
2. All %2function%* calls are to be evaluated by-value. However, there are
some %2⊗→special forms↔←%* we have seen which are not evaluated in the
normal manner. In particular, conditional expressions, quoted expressions,
and lambda expressions are handled differently.
The primary function is named ⊗→%3eval%*↔← rather than %3sogm%* (Son of Great Mother).
It will take two arguments; the first
is a representation of an expression to be evaluated, and the second
is a representation of a symbol table. The function %3eval%* will recognize numbers,
and the
constants %3T%* and %3NIL%*, and if presented with a variable, will attempt to
find the value of the variable in the symbol table using %3assoc%* ({yonss(P92)}).
%3eval%* also handles the special forms %3cond%* and %3quote%*. If %3eval%* sees
a ⊗→conditional expression↔← (represented by %3(COND ...)%* ) then the body
of the ⊗→%3COND%*↔← is passed to a subfunction named ⊗→%3evcond%*↔←.
%3evcond%* embodies the conditional expression semantics as described on {yon(P40)}.
The representation,
%3(QUOTE#%9α%3)%1, signifies the occurrence of a constant, %9α%1, which is
simply returned.
As far as this %3eval%* is concerned, any other expression is a call-by-value function
application.
The argument-list evaluation is handled by %3evlis%* in the authorized
left-to-right ordering. This is handled by recursion on the sequence of arguments.
In this case we %2apply%* the function
to the list of evaluated arguments.
This is done by the function %3apply%*.
With this short introduction we will now write a more general evaluator
which will handle a larger subset of LISP than the %3tgm%*s.
Here's the new %3eval%*:
.BEGIN SELECT 3;GROUP;TABIT1(7);
eval <= λ[[exp;environ]
\[isvar[exp] → lookup[exp;environ];
\ isconst[exp] → denote[exp];
\ iscond[exp] → evcond[rest[exp];environ];
\ isfunc+args[exp] → apply[func[exp];evlis[arglist[exp];environ];environ] ]]
.APART
.GROUP
%1and:%*
lookup <=λ[[var;env] value[assoc[var;env]]]
denote <= λ[[exp]
\[isnumber[exp] → exp;
\ istruth[exp] → exp;
\ isfalse[exp] → exp;
\ isSexpr[exp] → rep[exp] ]]
.GROUP
%1where:%*
.BEGIN FILL;INDENT 0,7;
%3rep%1 knows how to extract the S-expr from the representation. In our scheme
the selector %3rep%1 is given by %3cadr%1.
.END
evcond <= λ[[e;environ]
\[eval[ante[first[e]];environ] → eval[conseq[first[e]];environ];
\ %et%* → evcond[rest[e];environ] ]]
.APART
.GROUP
%1and,%*
.P191:
evlis <= λ[[e;environ]
\[null[e] → ( );
\ %et%* → concat[eval[first[e];environ];evlis[rest[e];environ]] ]]
.END
The selectors, constructors and recognizers which relate this abstract
definition to our particular S-expression representation are grouped with
%3apply%* on {yon(P108)}.
The subfunctions, %3evcond%* and %3evlis%*, are simple. %3evcond %*you've seen
before in %3tgmoafr%* in a less abstract form;
%3evlis%* simply manufactures a new list consisting of the
results of evaluating (from left to right) the elements of %3e%*, using the
symbol table, %3environ%*, where necessary.
Since %3evcond%1 and %3evlis%1 are LISP functions, %2they%1 are subject to the
left-to-right evaluation rule. Thus %3evlis%1 embodies the left-to-right rule.
If %3evlis%1 were evaluated under a right-to-left rule then %3evlis%1
would evaluate expressions in right-to-left order.
It is possible to write a version of %3evlis%1 which only depends on
being evaluated %aCBV%1, and which does embody the left-to-right rule:
.BEGIN TABIT2(7,12);SELECT 3;group;
evlis <= λ[[e;environ]
\[null[e] → ( );
\ %et%* → λ[[x]concat[x;evlis[rest[e];environ]]]
\\[eval[first[e];environ]] ]]
.END
The function ⊗→%3apply%*↔← takes three arguments: a representation of a function,
a representation of a list of evaluated arguments,
and a representation of a symbol table.
%3apply%* explicitly recognizes the representations of the five
primitive functions %3CAR,
CDR, CONS, EQ, %*and%3 ATOM%*. If the function name is a variable,
the
definition is located in the symbol table by %3eval%* and applied to the arguments.
Otherwise the function must be a λ-expression. This is where things
get interesting. We know we must evaluate the body of the
λ-expression after binding the formal parameters of the λ-expression
to the evaluated arguments. How do we ⊗→bind↔←? We add variable-value
pairs to the symbol table. We will define a subfunction, %3mkenv%*, to
perform the binding. Then all that is left to do is give the function
body and the new symbol table to %3eval%*. Here is %3apply%*:
.BEGIN SELECT 3;GROUP;TABIT1(15);
.P69:
apply <= λ[[fn;args,environ]
\[iscar[fn] → car[arg%41%*[args]];
\ iscons[fn] → cons[arg%41%*[args];arg%42%*[args]];
\ ... ...
\ isvar[fn] → apply[eval[fn;environ];args;environ];
\ islambda[fn] → eval[body[fn];mkenv[vars[fn];args;environ]] ]]
mkenv <= λ[[vars;vals;environ] pairlis[vars;vals;environ]]
pairlis <= λ[[vars;vals;environ]
\[null[vars] → environ;
\ %et%* → concat[mkent[first[vars];first[vals]];pairlis[rest[vars];rest[vals];environ]] ]]
.END
Some of the functions and predicates which will relate these abstract definitions to our
specific S-expression representation of LISP constructs are:
.BEGIN TABIT3(5,33,53);GROUP;SELECT 2;
.P108:
\Recognizer\Selector\Constructor
.SELECT 3;
iscar <= λ[[x]eq[x;CAR]]\func <= λ[[x]first[x]]\mkent <= λ[[x;y]cons[x;y]]
isSexpr <= λ[[x]eq[first[x];QUOTE]]\arglist <= λ[[x]rest[x]]
istruth <= λ[[x] eq[x;T]]\body <= λ[[x]third[x]]
\\vars <= λ[[x]second[x]]
.END
The only really new development is in the λ-binding process.
Another application of the left-to-right property occurs here,
within %3apply%*, in the symbol table
search and construction process. Notice that
%3lookup%1 uses %3assoc%* to look from left to right
for the latest binding of a variable. Thus the function which %2augments%* the
table must add the latest binding to the %2front%*. When do new bindings
occur? They occur at λ-binding time, and at that time the function
%3mkenv%1, using %3pairlis%*, will
build an augmented symbol table with the λ-variables bound to their evaluated
arguments. The
functions %3lookup%1 and %3mkenv%1 operate together. We will see representations
of these functions other than %3assoc%1 and %3pairlis%1. The actual search and
construction operations will change, but the critical relationship that %3mkenv%1
always builds a table compatable with the search strategy of %3lookup%1
will be maintained. %2This is important.%*
.P86:
To summarize then: the evaluation of an expression %3f[a%41%*;#...#;a%4n%*]%1,
where the a%4i%*'s are S-exprs,
is the same as the result of applying %3eval%* to the %eR%*-translation,
%3(%eR%f(%3#f#%f)%3,#%eR%f(%3#a%41#%f)%3,#...#%eR%f(%3#a%4n#%f)%3).%1
This behavior is again an example of the diagrams of {yon(P46)}. In its
most simple terms, we mapped LISP evaluation onto the LISP %3eval%* function;
mapped LISP expressions onto S-expressions; and executed %3eval%*. Notice
that in this case we do not reinterpret the output since the structure of the
representation does this implicitly. We have commented on the efficacy
of this already on {yon(P110)}.
This specification of the semantics of LISP using %3eval%* and %3apply%*
is one of the most interesting developments of computer science.
.CENT(Problems)
I. Compare our version of %3eval%* and %3apply%* with the version given in the appendix.
Though the current version is much more readable, how much of it %2still%* depends
on the representation we chose? That is, how abstract is it really?
II. Complete the specification of the selectors, constructors, and recognizers.
.SS(Examples of %3eval%*,examples of %3eval%*)
We will demonstrate the inner workings of the evaluation algorithm
on a couple of samples and will describe the flow of control in the
execution in a couple of different ways.
The examples will be done in terms of the image of the %eR%*-mapping
rather than being done abstractly. We do this since the structure of an
actual LISP evaluator will use this representation⊗↓Recall that we will
be programming in the %eR%*-image.←.
It is important that
you diligently study the sequence of events in the execution
of the evaluator. The process is detailed and tedious but it must be done
once so that you may convince yourself of its correctness.
Let's evaluate %3f[2;3]%* where %3f <= λ[[x;y] x%82%* + y].%1
That is, evaluate:
.BEGIN TURN OFF "{,}";
←%3eval[ %eR%f(%3 f[2;3] %f)%3; %eR%f(%3{ <f, λ[[x;y] +[↑[x;2]; y]]> }%f)%3]%1.
.END
After appropriate translation this is equivalent to evaluating:
←%3eval[(F 2 3); ((F . (LAMBDA (X Y) (PLUS (EXPT X 2) Y))))]%*
%2Notes:%*
%21.%* %3((F . (LAMBDA (X Y) ... ))) = ((F LAMBDA (X Y) ... )) %*
This is mentioned because LISP implementations will print the latter
even if you write the former.
%22.%* Since the symbol table %3((F ...))%* occurs so frequently in the following
trace, we will abbreviate it as %3st%*. Note that we have no mechanism yet
for permanently increasing the repertoire of known functions. We must
resort to the subterfuge of initializing the symbol table to get the function
%3f%* defined.
%23.%* For this example we must assume that + and ↑ (exponentiation) are known functions. Thus
%3apply%* would have to contain recognizers for %3PLUS%* and %3TIMES%*:
.BEGIN NOFILL;
.GROUP
%3
... atom[fn] → [isplus[fn] → +[arg%41%*[args];arg%42%*[args]];
isexpt[fn] → ↑[arg%41%*[args];arg%42%*[args]];
... ]
...
.APART
%1
.END
.BEGIN NOFILL;TABS 10,25;TURN ON "\";
.P129:
.GROUP
So %3eval[(F 2 3);st]
\= apply[func[(F 2 3)]; evlis[arglist[(F 2 3)];st];st]
\= apply[F;evlis[(2 3);st];st]
\= apply[F;(2 3);st]
\= apply[eval[F;st];(2 3);st]
\= apply[(LAMBDA (X Y) (PLUS (EXPT X 2) Y)); (2 3);st]
\= eval[body[(LAMBDA (X Y) (PLUS (EXPT X 2) Y))];
\ mkenv[vars[(LAMBDA (X Y) (PLUS (EXPT X 2) Y))];(2 3);st]]
\= eval[(PLUS (EXPT X 2) Y);pairlis[(X Y);(2 3);st]]
\= eval[(PLUS (EXPT X 2) Y);((X . 2)(Y . 3)(F LAMBDA (X Y) ...))]
\= apply [PLUS; evlis[((EXPT X 2) Y);((X . 2)(Y . 3)..)];((X . 2)...)]
\ %1Let's do a little of:%3 evlis[((EXPT X 2) Y);((X . 2)(Y . 3)...)]
\\= concat[eval[(EXPT X 2);((X . 2)(Y . 3) ...)];
\\ evlis[(Y);((X . 2) ...)]]
\\= concat[apply[EXPT;evlis[(X 2);((X . 2)...)];((X . 2) ...]
\\ evlis[(Y); ...]]
\\= concat[apply[EXPT;(2 2);((X . 2) ...];evlis[(Y); ...]]
\\= concat[↑[arg%41%*[(2 2)];arg%42%*[(2 2)]];evlis[(Y); ... ]]
\\= concat[↑[2;2];evlis[(Y); ... ]]
\\= concat[4;evlis[(Y);((X . 2)(Y . 3) ....)]]
\\= concat[4;concat[eval[Y;((X .2) ...)]; evlis[( );(( ...))]]]
\\= concat[4;concat[3;( )]]
\\= (4 3)%1
Now back to apply:
%3
\= apply[PLUS;(4 3);((X . 2)(Y . 3) ... )]
\= +[4;3]
\= 7%1
.APART
.END
It should now be clear that %3eval%* and friends %2do%*
perform as you would expect. It perhaps is not clear that a
simpler scheme might not do as well. In particular, the complexity
of the symbol table mechanism which we claimed was so important
has not been exploited. The next example will indeed show that a
scheme like ours is necessary to keep track of variable bindings.
.GROUP
Let's sketch the evaluation of %3fact[3]%* where:
←%3fact <= λ[[x][x = 0 → 1;%et%* → *[x;fact[x-1]]]];%1
.P42:
that is, %3eval[(FACT 3);st] %*where %3st%* names the initial symbol table:
←%3((FACT .(LAMBDA (X) (COND ((ZEROP X) 1) (T (TIMES X (FACT (SUB1 X)))))))).%1
.APART
In this example we will assume that the binary function *, the
unary predicate %3zerop#<=#λ[[x]x#=#0]%* and unary function
%3 sub1#<=#λ[[x]#x-1]%* are known and are
recognized in the evaluator as %3TIMES, ZEROP%* and %3SUB1%* respectively.
.BEGIN NOFILL;TURN ON "\";TABS 10,20,30;
.GROUP
%1Then%3 eval[(FACT 3);st]
\= apply[FACT;evlis[(3);st];st]
\= apply[(LAMBDA (X) (COND ...));(3);st]
\= eval[(COND ((ZEROP X) 1) (T ( ...)));((X . 3) . st)]
\= evcond[(((ZEROP X) 1) (T (TIMES X (FACT (SUB1 X)))));((X . 3) . st)]
%1(Now, let %3st1%* be%3 ((X . 3) . st) )
\= eval[(TIMES X (FACT (SUB1 X))); st1]
\= apply[TIMES;evlis[(X (FACT (SUB1 X))); st1];st1]
\= apply[TIMES;concat[3;evlis[((FACT (SUB1 X))); st1]];st1]
%1Now things get a little interesting inside %*evlis:
\evlis[((FACT (SUB1 X)));st1]
\\= concat[eval[(FACT (SUB1 X)); st1];( )]
\\%1 and %* eval[(FACT (SUB1 X));st1]
\\\= apply[FACT;evlis[((SUB1 X));st1];st1]
\\\= apply[FACT; (2);st1]
\\\= apply[(LAMBDA (X) (COND ...));(2);st1]
\\\= eval[(COND ((ZEROP X) 1) ...));((X . 2) . st1)]
\\\...
.APART
.FILL
%1
Notice that within this latest call on %3eval%* the symbol-table-searching function,
%3lookup%*, will find the pair %3(X . 2)%* when looking for the value of %3x%*. This is
as it should be. But notice also that the older binding, %3(X . 3)%*, is still
around in the symbol table %3st1%*, and will become accessible once we complete
this latest call on %3eval%*.
It will become accessible because this earlier manifestation of the table was
saved by the λ-binding process as we entered the inner call on %3eval%1; as we
leave this inner evaluation, the previous incarnation of the table is restored.
.GROUP
As the computation continues, the current symbol table appears as follows:
.NOFILL
%3
←((FACT LAMBDA (X) (COND ...))) = st,
←((X . 3) . st) = st%9'%3,
←((X . 2) . st%9'%3) = st%9''%3,
←((X . 1) . st%9''%3) = st%9'''%3,
←((X . 0) . st%9'''%3).
%1
.FILL
.APART
Thus each new level of the table builds on the prior %3st%1; each prior %3st%1
is saved in the following line from %3apply%1 ({yon(P69)}):
.BEGIN CENTERIT;SELECT 3;
←islambda[fn] → eval[body[fn];mkenv[vars[fn];args;environ].
.END
The call on %3eval%1 is performed with the augmented table; when we leave that
inner %3eval%1 we return to an environment which contains the prior %3st%*.
We claim that using %3mkenv%* to concat the new
bindings onto the front of the symbol table as we call %3eval%*
does the right thing.
The tricky part is that when we leave that particular call on
%3eval%*, the old table is automatically restored by the recursion
mechanism. That is, if %3st%* is the current symbol table then
%3concat%*-ing things onto the front of %3st%* doesn't change %3st%*, but if we
call %3eval%* or %3apply%* with a symbol table of say:
←%3concat[(X . 2);concat[(X . 3); st]] %*
then in %2that%* call on %3eval%* or %3apply%* we have access to %3x = 2%*, not
%3x = 3%*.
In this representation,
the search function %3lookup%1 always proceeds from left to right through
the table and, since the table entry function %3mkenv%* always %3concat%*s pairs
onto the left of the table before %3eval%* is called, we will get the correct
binding of the variables.
.P212:
The structure of %3mkenv%1 should be analyzed further: it takes a formal
parameter list, an evaluated actual parameter list, and an environment, as
its arguments; it allocates a new block to contain the name-value pairs and
proceeds to send each name-value pair to its proper slot in the block. The value
of %3mkenv%1 is the newly constructed environment formed by
linking the new block onto the front of the old environment.
It turns out that %3pairlis%1
is able to combine the action of making the new block and filling the slots.
.BEGIN GROUP;
A more accurate picture of the abstract behavior of %3mkenv%1 is:
.TABIT2(29,39);SELECT 3;
.P213:
mkenv <= λ[[vars;vals;env] mkenv%9'%3[vars;vals;alloc[vars];env]]
mkenv%9'%3 <= λ[[vars;vals;block;env]\[null[vars] → link[block;env];
\ %et%3 → mkenv%9'%3[\rest[vars];
\\rest[vals];
\\send[first[vars];first[vals];block];
\\env] ]]]
.END
.BEGIN CENTERIT;group
Our current implementation of %3pairlis%1 is equivalent to:
←%3alloc <=λ[[x] ( )] ⊗↓%3alloc%1 is defined as a unary function even though
its argument is ignored here. This generality is in anticipation of future
binding implementations.←
←%3send <= λ[[var;val;block] concat[mkent[var;val];block]]
←link <= λ[[block;env] append[block;env]]
.END
The computational behavior of %3pairlis%1 is slightly different: the
name value pairs are added to the environment in a different order. Since
the variables in the λ-list are required to be distinct from one another,
the resulting environment is equivalent.
Symbol table manipulation is very important, so let's look at it
again in a slightly different manner.
.END
.<<the | | symbol table >>
.P43:
In this example,
expressions and table entries will be written more informally.
Since the evaluator is operating on the S-expr representation of expressions
we should continue to present these arguments to %3eval%1 as S-exprs.
However, the object being represented is frequently more understandable
and readable
than the representation of that object. Thus, initially, we
will write %3quote[%9x%3]%1⊗↓%1See {yon(P44)} for %3quote%1.←
rather than the explicit %eR%1-image of %9x%1; for example, write %3quote[fact[3]]%1
rather than %3(FACT#3)%1.
Later we will simply write %9x%1, without the %3quote%1 where no confusion is
likely.
With similar motivation,
we represent the symbol table between vertical
bars, "|", in such a way that if a table, t%41%*, is:
.BEGIN TABIT2(10,17);GROUP
\| b%4n%*\|
\| ...\| then %3concat%*ing a new element, b%4n+1%* onto t%41%* gives:
\| b%41%*\|
\| b%4n+1%*\|
\| b%4n%*\|
\| ...\|
\| b%41%*\|
.END
Note that the elements of the table should also be presented as
S-exprs. We will represent the entires in a more transparent form as
simple equations. Thus, for example:
%3
.BEGIN NOFILL;TABS 10,56,68;TURN ON "\";select 3;
eval[quote[fact[3]]; | fact = λ[[x][x=0 → 1;%et%* → *[x;fact[x-1]]]] | ]
.GROUP
\= eval[quote[[x=0 → 1; %et%* → *[x;fact[x-1]]]];\|x = 3 |
\\|fact = λ[[ ... | ]
.APART
.GROUP
\= *[3;eval[quote[[x=0 → ...]; \| x = 2 |
\\| x = 3 |
\\|fact = λ[[ ... | ]
.APART
.GROUP
\= *[3; *[2;eval[quote[[x=0 → ...];\| x = 1 |
\\| x = 2 |
\\| x = 3 |
\\|fact = λ[[ ... | ]
.APART
.GROUP
\= *[3; *[2; *[1;eval[quote[[x=0 → ...];\| x = 0 |
\\| x = 1 |
\\| x = 2 |
\\| x = 3 |
\\|fact = λ[[ ... | ]
.APART
.GROUP
\= *[3; *[2; *[1;1]]] %1with:%*\| x = 1 |
\\| x = 2 |
\\| ... |
.APART
.GROUP
\= *[3; *[2;1]] %1with:%*\| x = 2 |
\\| ... |
.APART
.GROUP
\= *[3;2] %1with:%*\| x = 3 |
\\| ... |
\= 6. %1with:%*\|fact = λ[[ ... |.
= 6
.END
%1
Notice that after we went to all the trouble to save the old values
of %3x%* we never had to use them. However, in the general case of
recursive evaluation we must be able to save and restore the old values
of variables.
For example, recall the definition of %3equal:
.BEGIN NOFILL TURN ON "\";TABS 17;
.GROUP
%3
equal <= λ[[x;y]
\[atom[x] → [atom[y] → eq[x;y]; %et%* → %ef%*];
\ atom[y] → %ef%*;
\ equal[car[x];car[y]] → equal[cdr[x];cdr[y]];
\ %et%* → %ef%*]]
.APART
%1If we were evaluating:%3
.CENTERIT
←equal[((A . B) . C);((A . B) . D)],
%1
then our symbol table structure would change as follows:
.END
%3
.BEGIN NOFILL;TABS 10,40;TURN ON "\";
.GROUP
\|equal = λ[[x;y] ...| ==>\|x = ((A . B) . C)| ==>
\\|y = ((A . B) . D)|
\\|equal = λ[[x;y]...|
.APART
.GROUP
\| x = (A . B) |\| x = A |
\| y = (A . B) |\| y = A |
\| x = ((A . B). C) | ==>\| x = (A . B) |
\| y = ((A . B). D) |\| y = (A . B) | ==>
\|equal = λ[[x;y ... |\| x = ((A . B). C) |
\\| y = ((A . B). D) |
\\|equal = λ[[x;y] ...|
.APART
.GROUP
\| x = B |\| x = C |
\| y = B |\| y = D |
\| x = (A . B) |\| x = ((A . B). C) | ==>
\| y = (A . B) | ==>\| y = ((A . B). D) |
\| x = ((A . B). C) |\|equal = λ[[x;y] ... |
\| y = ((A . B). D) |
\|equal = λ[[x;y] ... |
.CENTERIT
←|equal = λ[[x;y] ... |
.END
%1
This complexity is necessary, for while we are evaluating
%3equal[car[x];car[y]]%*, we rebind %3x%* and %3y%* but we must save the old
values of %3x%* and %3y%* for the possible evaluation of %3equal[cdr[x];cdr[y]].%*
.P93:
Before moving on we should examine %3eval%* and %3apply%* to see how they
compare with our previous discussions of LISP evaluation.
The spirit of call-by-value and conditional expression evaluation is maintained.
λ-binding seems correct, though our current discussion is not complete.
At least one preconception is not maintained here. Recall the discussion on
{yon(P95)}. We wanted n-ary functions called with exactly n arguments. An
examination of the structure of %3eval%* and %3apply%* shows that
if a function expecting %2n%* arguments is presented with fewer, then
the result is undefined; but if it is given %2more%* arguments than necessary
then the calculation is performed. For example:
.BEGIN TABIT1(33);GROUP;SELECT 3;
eval[(CONS(QUOTE A)(QUOTE B)(QUOTE C));NIL]
\= eval[(CONS(QUOTE A)(QUOTE B));NIL]
\= (A . B)
.END
This shows one of the pitfalls in defining a language by an evaluator.
If the intuitions of the language specifiers are faulty or incomplete
then either we are faced with maintaining that faulty judgement,
or we must lobby for a "revised report".
The definition of a language by an evaluator written in that language
is subject to other criticisms. The troublesome areas of our description of
LISP's evaluation included λ-binding, calling styles in general and call-by-value
in particular, and left-to-right order of evaluation. We wrote %3eval%1 to
explicate the meaning of these constructs, yet within %3eval%1 we often relied
on exactly these constructs to convey our intent. Now, our description
in not entirely circular; %3eval%1 does convey much of our intention to
the reader. However much of the discussion of %2how%1 these constructs operate
is either implicit or is explained by using the same constructs.
In gaining a clearer understanding of what LISP constructs mean, %3eval%1
is exemplary. Indeed many of the details of how these constructs work
are irrelevant to such an understanding. However, when we attempt to
implement a language feature we cannot assume the existence of that
feature; the implementation must be prepared from a combination of more primitive
components. As we procede through the text we will make explicit
the mechanisms which are necessary to correctly implement LISP's constructs
and indeed the constucts of most other languages. In {yonss(P187)} we
give several alternative algorithms for %3eval%1. They will evolve to an %3eval%1
which makes explicit most of the mechanisms we need. In {yonsec(P188)}
we will begin to discuss efficient representations for LISP's data structures,
control structures, and primitive operations.
The remainder of this chapter will explicate further features of LISP
in preparation for that discussion.
.CENT(Problems)
%2I%* Which parts of %3eval%* and Co. allow correct evaluation of functions
applied to too many arguments?
%2II%* On {yon(P93)} we noted that the evaluator performs "correctly"
when evaluating forms like %3cons[A;B;C]%*.
Can you find other anomalies in %3eval%* and friends? That is, places
where unexpected results are obtained?
.SS(Variables,variables,P135:)
.BEGIN TURN ON "#";
Let's look more closely at λ-binding in %3eval%*.
The scheme presented seems reasonable,
but as in the case %3cons[A;B;C]%*, there may be more going on here
than we are perhaps aware of.
If we asked %3eval%*
to compute %3f[2]%*, given a representation for %3f#<=#λ[[x]#x#+#y]%1
but no represenation for the value of %3y%1
it would complain. It would
it would find %3f%1,
bind %32%* to %3x%*;
it would begin the evaluation of the body of %3f%1, finding %3x%1's value,
but then would find no value for
%3y%*. However, if we asked it to evaluate the form %3λ[[y]f[2]][1]%*
it %2would%* work. It would find the value of %3y%* to be
%31%* and would get a final answer of %33%* as expected. You should convince
yourself of this assertion.
Within the evaluation of %3f[2]%* in %3λ[[y]f[2]][1]%1
the variable %3y%* has a different character from that of %3x%*.
The value of %3x%1 is found within the latest λ-binding, whereas %3y%1
was bound in a dynamically surrounding λ-binding. That is, the λ-expression
which bound %3y%1 took effect before the binding of %3x%1 and is still in effect
when the binding of %3x%1 is made. We do have access to %3y%1's binding in
this case; the %3lookup%1 routine will locate %3y%1's value.
There is a third kind of name-value association present in these examples:
we expect that the symbol "+" is recognized during the evaluation as denoting
a procedure for computing the sum of two numbers. In previous discussions
we have assumed that "+" was pre-defined inside %3apply%1 and therefore
explicitly recognized. Finally, in the first example, a fourth kind of
variable usage occurred. The variable %3y%1 had no associated value when
the computation expected one. In this section we wish to examine
properties of variables.
The implementation of λ-bindings described in
%3pairlis%1 ({yon(P191)}) is slightly misleading. There, the new λ-bindings
are %3concat%1-ed onto the front of the existing table. They go on
in a one-at-a-time fashion even though they are to be thought of as
a logical unit: at the language level they all go on together, and
they all come off together. It is the structure of this table
which we should also examine. To these ends we now introduce some terminology.
Consider the evaluation of the expression:
.BEGIN CENTERIT;SELECT 3;
←λ[[y]equal[λ[[x]cons[x;y]][(A . B)];x]][A]
.END
in an environment where the definition
of %3equal%1 is known.
We evaluate the main argument %3A%1, and perform the λ-binding of %3A%1
to %3y%1. This operation of λ-binding creates what we call
a %2⊗→local symbol table↔←%1 and the variables bound in that local
table are called
%2⊗→local binding↔←s%1 for the body of the λ-expression.
We now begin the evaluation of the arguments
to %3equal%1. The first argument is itself an expression requiring λ-binding.
We evaluate it's argument and bind %3(A#.#B)%1 to %3x%1.
This creates a local binding
for %3x%1. In the process of making %3x%1 local what happens to %3y%1?
Notice that the binding process has not made %3y%1 inaccessible: we
can compute %3cons[x;y]%1 even though %3y%1 is not local. Variables like %3y%1
which are accessible, but not local, we call %2⊗→non-local↔←%1 variables.
Thus both %3y%1 and %3cons%1 are non-local variables in our evaluation
of %3cons[x;y]%1. There is a further distinction between %3y%1 and %3cons%1:
We expect %3cons%1 to be a predefined function; indeed %3cons%1 has not been
λ-bound any where in our computation. Variables like %3cons%1 we will call
%2⊗→global variable↔←s%1.
Global variables
include predefined function names, %3car%1, %3cdr%1, etc, and
variables like %3t%1 and %3nil%1.
A useful interpretation of global
variables is that they are bound in the initial symbol table, also
called the %2global table%1⊗↓This analogy
breaks down somewhat in that usual implementations of LISP allow this global table
to be augmented; for example, by function definitions using a version of "<=".
Thus the global table can be enlarged whereas
a true λ-binding involves a fixed number of variables.←.
Non-local variables which are λ-bound
somewhere in the symbol table we call
%2⊗→free variable↔←s%1, and variables which have %2some%1 accessible binding
at the current point in the computation are called %2⊗→bound variable↔←s%1⊗↓Our
notion of free
and bound variables has a decidedly computational flavor, in contrast
to the mathematical definitions of "free" and "bound".←.
Finally the first argument to %3equal%1 is evaluated giving %3((A#.#B)#.#A)%1.
As we complete that evaluation the local binding for %3x%1 is removed, and
%3y%1 becomes local again. We examine the second argument to %3equal%1, %3x%1,
and find there is now no binding for that variable. Variables which have no
binding of any kind at the time we ask for a value
are called %2⊗→unbound variable↔←s%1. The local, free, and global variables
make up the class of %2⊗→bound variable↔←s%1.
For a computation to be
meaningful, each variable which that computation references must be bound
%2somewhere%1 when we ask for its value. So the computation of our current example
would fail. Since we are doing call-by-value, it would fail even before we
asked for the definition of %3equal%1. One of our tasks will be to
discuss where definitions such as that for %3equal%1 should be kept.
Here is a diagram of our characterization of variables:
.BEGIN SELECT g;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#";
.TURN ON "∞" FOR "%";
∞1variables∞*
~
⊂ααααααα∀ααααααα⊃
~ ~
∞1non-local∞* ∞1local∞*
~
⊂αα∀ααααααπααααααααααα⊃
~ ~ ~
∞1free∞* ∞1global∞* ∞1unbound∞*
.END
Notice that a variable which is initially global may become
local and then non-local by virtue
of λ-bindings.
One of the difficulties in programming languages
is deciding what value to associate with a non-local variable.
In LISP, it is clear %2how%* values get associated; it happens through λ-binding
or by virtue of an initial entry in the symbol table.
The binding strategy for local variables is reasonably
uniform in programming languages: bind some form of the actual parameter⊗↓
evaluated or unevaluated← to the formal parameter and evaluate the body of the
definition.
The difficulty is that frequently there will be choices of values to
associate.
The scheme which LISP uses for discovering the value of any variable is to
proceed linearly down the symbol table, looking for the %2latest%* binding.
This scheme is called %2⊗→dynamic binding↔←%*. It %2usually%* results
in uncovering the value that is expected; but not always.
Conceptually, the dynamic binding scheme corresponds to the physical
replacement of the function call with the function body and then an
evaluation of the resulting expression.
In review, the evaluation of a typical function-call will involve
the evaluation of the arguments, the binding of the λ-variables
to those values, the addition of these new bindings to the front of the
symbol table,
and finally the evaluation of the body of the function.
That segment of the symbol table which we have just added by the λ-binding will be called
the %2local symbol table%* or local environment. The variables
which appear in that segment are the local variables.
The remainder of the symbol table makes up the %2non-local table%*.
Variables which appear in the global table but not in any local table
are the %2⊗→global variable↔←s%*.
Free variables are bound somewhere between the local table and the global table.
Variables which are local to a form-evaluation are those which were
present in the λ-binding.
We first wish to develop a useful notation for describing
bindings before delving further into the intricacies of binding strategies.
That discussion will be the content of {yonss(P134)}.
.END
.BEGIN GROUP;
.BEGIN SELECT 2;CENTERIT;
←Problems
.END
I Write a LISP predicate, %3non <= λ[[x;e] ... ]%*, which will give %et%*
just in the case that %3x%* and %3e%* represent a variable and a λ-expression
respectively, and %3x%* is non-local to %3e%*.
.END
.SS(Environments and bindings,,P77:)
.<< the Weizenbaum symbol table>>
This section will introduce one more notation
for describing symbol tables or environments. This notation, due to
J. Weizenbaum, only shows the abstract structure of the symbol table manipulations
during evaluation. Its simplicity will be of great benefit when we introduce
the more complex binding schemes necessary for function-valued functions in
{yonss(P76)}.
In the previous discussions it has been sufficient
simply to think of a symbol table as a sequence of pairs; each pair was a variable
and its associated value. This sufficed because we dealt only with
λ-variables; we ignored the possibility of free variables.
As long as we added the λ-bindings to the
%2front%* of the sequence representing the symbol table we showed that
expected evaluation would result. Local values were found in the table;
global values were found by explicit recognizers in %3eval%1 and %3apply%1.
With the advent of free variables, however it is convenient, and soon
will be necessary, to examine the structure of environments more closely.
We will describe our environments in terms of a local symbol table
augmented by a description of where to look for the non-local values.
Instead of having one amorphous sequential symbol table, we
envision a sequence of tables. One is the local table, and its
successor in the sequence is the previous local table. The
information telling where to find the previous table is called the
%2⊗→access chain↔←%* or %2⊗→access link↔←%1.
Thus if tables are represented by E%4i%1 and the access link
by %d→%* then we might represent a symbol table as:
.BEGIN CENTERIT;
←%3(%1E%4n%d → %1E%4n-1%d → %1 ...%d→ %1E%41 %d→%1 E%40%3)
.END
where %1E%4n%1 is the local or current segment of the table.
We reserve E%40%1 to name the global table.
LISP thus finds local bindings in the local
table and uses the access chain to find bindings of non-local variables.
If a variable is not found in any of the tables, then it is unbound.
Now to establish some notation.
An environment will be described as :
.BEGIN TABIT2(36,40);GROUP;
\\%aForm%*
\\E%4local%*
\ \| E%4i%*
\_________
\var\| value
\%3v%41%1\| %aval%41%1
\%3v%42%1\| %aval%42%1
\ .....
\%3v%4n%1\| %aval%4n%1
\\|
.END
%aForm%* is the current form being evaluated.
E%4local%* is the name of the current environment or symbol table.
Let %3x%* be a variable appearing in %aForm%*. If %3x%* is not found
among the %3v%4j%1's,
then entries in the table named E%4i%* are examined.
If the variable is not found
in E%4i%* then the environment mentioned in the upper right-hand quadrant
of E%4i%* is searched. The search will terminate if the variable is found;
the value is then the corresponding %aval%4j%1.
If %3x%* is not found locally, and the symbol "/" appears in
the right-hand quadrant, then %3x%* is unbound.
The notation is used as follows: when we begin the evaluation of a
form, the initial table E%40%* is
set up with "/" in its access field.
The execution of a function definition, say %3f#<=#λ[[x;y]x%82%*#+#y]%1, will
add an appropriate entry to the table, binding %3f%* to its lambda
definition⊗↓Note that we really mean "representation of lambda definition". However
the informal notation is easier to read and thus we will continue the deception.←.
Now, consider the evaluation of the form %3f[2;3]%*.
When the λ-expression is entered, i.e.,
when we bind the evaluated arguments (%32%* and %33%*) to the
λ-variables (%3x%* and %3y%*), a new local table (E%41%*) is set up
with an access link to E%40%*.
Entries reflecting the binding of the λ-variables are made in E%41%* and
evaluation of the λ-body is begun.
.BEGIN TURN ON "\";NOFILL;TABS 15,18,21,36,39,54,59,62,67,70,83;
.GROUP
The flow of symbol table creation is:
\\%3f[2;3]\\x%82%* + y%1
\\E%40%*\\\E%41%*\\\E%40%*
\ \| /\\ \| E%40%*\\ \| /
\\______ =>\______\=>\______
\ %3f\| λ[[x;y]x%82%* + y]%1 \%3x\| 2\ \f\| λ[[x;y] ...
\\\\ y\| 3%1
Compare this sequence to the example on {yon(P129)}.
.END
.BEGIN GROUP;
The sequence of tables corresponds to the evaluation sequence:
.BEGIN CENTERIT;GROUP;turn off "{","}";
←%3eval[%eR%f( %3f[2;3] %f)%3; %eR%f( %3{<f , λ[[x;y] x%82%* +y]>} %f)%3]
←↓
←%3eval[%eR%f( %3x%82%* + y %f)%3; %eR%f( %3{<x, 2>, <y, 3>, <f , λ[[x;y] x%82%* +y]>} %f)%3]
←↓
←%37%1
.END
.END
.FILL;
.<<RED'S FACT>>
.GROUP
The execution of %3fact[3]%* on {yon(P42)} results in a more interesting
example. The following discussion should be read in conjunction with that
description⊗↓The layout of this example is due to R. Davis.←.
.P104:
.BEGIN TURN ON "\";NOFILL;TABS 9,14,24,29,34,44,49,54,65,67;preface 0 mills;
.TURN OFF "←";
\\\\\\\\\%B⊃%*
\\\\\\\\\%Bεα→ %36%*
\\%3fact[3]\\\[x=0→ ...]\\\*[x;fact[x-1]]\%bε←⊃%*
\\%1E%40%*\\\E%41%*\\\E%41%*\%b~ ~%*
\ \| /\\ \| E%40%*\\ \| E%40%*\%b$ ↑%*
\_______\=>\_______\=>\_______ =>\\%32%*
\%3fact\| λ[[x][x=0→1;...]\ x\| 3\\ x\| 3\\%b↑%*
\\\\\\\\\%B⊃ ~%*
\\\\\\\\\%Bε→$%*
\\%3fact[2]\\\[x=0→ ...]\\\*[x;fact[x-1]]\%b~%*
\\%1E%41%*\\\E%42%*\\\E%42%*\%bε←⊃%*
\ \| E%40%*\\ \| E%41%*\\ \| E%41%*\%b$ ↑%*
=>\_______\=>\_______\=>\_______ =>\\%31%*
\%3 x \| 3\\ x\| 2\\ x\| 2\\%b↑%*
\\\\\\\\\\%B~%*
\\\\\\\\\%B⊃ ~%*
\\\\\\\\\%Bε→$%*
\\%3fact[1]\\\[x=0→ ...]\\\*[x;fact[x-1]]\%b~%*
\\%1E%42%*\\\E%43%*\\\E%43%*\%bε←⊃%*
\ \| E%41%*\\ \| E%42%*\\ \| E%42%*\%b$ ~%*
=>\_______\=>\_______\=>\_______ =>\\%b~%*
\%3 x \| 2\\ x\| 1\\ x\| 1\\%b↑%*
\\\\\\\\\\%31%*
\\\\\\\\\\%B↑%*
\\\\\\\\\\%B~%*
\\%3fact[0]\\\[x=0→1; ...]\\\\%b⊃ ~%*
\\E%43%*\\\E%44%*\\\\%b~ ↑%*
\ \| E%42%*\\ \| E%43%*\\ \%1send%*\%b~ ~%*
=>\_______\=>\_______\=> 1\\ %31 \%bε→$%*
\%3 x \| 1\\ x\| 0\\ \%1back up\%b$%*
.END
At the end of the first line we are faced with the evaluation of
%3*[x;fact[x-1]]%*. This requires the evaluation of the arguments
to *; this is done by %3evlis%* and requires the evaluation of %3fact[x-1]%*
using the environment E%41%*. In E%41%* %3x-1%* gives %32%*, and
we find the definition of %3fact%* in E%40%*. In line two
we set up E%42%* and evaluate %3fact[2]%*. Analogous situations
occur until line four; at this time we suddenly find ourselves in
E%44%* with %3x%* bound to %30%*. The predicate %3x=0%* is satisfied and
we start back up the right margin to conclude the nested evaluations
of %3*[x;fact[x-1]]%*. This process finally terminates at the
top, returning a value %36%*.
.APART
Notice in this example that we will get the correct binding of %3x%* locally.
It is important to note that the occurrence of %3fact%* within the body of
the definition of %3fact%* is %2global%*. We find the correct binding
for %3fact%* by searching the access chain.
We must search the access chain even though %3fact%1 is global.
We cannot shortcut the search by simply looking in E%40%1. A variable
might have been rebound in an enclosing environment and it would be
that binding we should discover.
.GROUP
As a final example showing access to non-local variable bindings consider
%3f[3]%* where %3f#<=#λ[[x]g[2]]%* and %3g#<=#λ[[y]x+y]%*.
.BEGIN TURN ON "\";NOFILL;TABS 17,20,25,38,40,56,61,64,69,72,85;
.GROUP
\\%3f[3]\\\g[2]\\\x + y ....
\\E%40%*\\\E%41%*\\\E%42%* ....
\\| /\\ \| E%40%*\\\| E%41%*
\______\ =>\______\=>\______\ ......
\ %3f\| λ[[x] g[2]]\%3x\| 3\ \y\| 2 ...
\ %3g\| λ[[y] x+y]
.END
Notice that when we attempt to evaluate %3x + y%* we find %3y%* has a local
value, but we must look down the access chain to find a binding for %3x%*.
.APART
The scheme for using Weizenbaum environments for the current LISP subset
is:
.BEGIN INDENT 5,5;GROUP;
.P189:
When doing a λ-binding, set up a new E%4new%* with
the λ-variables as the local variable entries and the values of the
arguments as the corresponding value entries.
The access slot of the new E%4new%* points
to the previous symbol table. The evaluation of the body of the
λ- expression takes place using the new table; when a local variable
is accessed we find it in E%4new%*; when a non-local or global variable occurs, we
chase the access chain to find its value.
When the evaluation of the body is completed, E%4new%* disappears
and the previous environment is restored.
.END
You should convince yourself that the current access- and binding-scheme
espoused by LISP is faithfully described in these diagrams.
.CENT(Problem)
.P223:
%21.%* By now you should realize that environments really are a class of abstract
data structures: they should have constructors, selectors, and recognizers.
To help discover what a set of such functions might be,
give a representation for Weizenbaum environments and write new versions
of the symbol table manipulating functions,
%3lookup%1 and %3mkenv%1, which will operate on Weizenbaum environments.
See {yon(P212)}.
.SS(%3label%*,%3label%*,P38:)
.BEGIN TURN ON "←";
One effect of placing "λ" and a list of λ-variables in front
of an expresson is to bind those variables which appear in
the λ-list. All other variables
appearing in the expression are non-local.
For example, %3f%* is non-local in the following:
←%3f <= λ[[x][zerop[x] → 1; %et%* → *[x;f[x-1]]] ].%1
Clearly our intention is that the %3f%* appearing to the right of "<="
is the same as the %3f%* appearing to the left of "<=".
This has not been a problem for us. We have simply pre-loaded the
symbol table, binding %3f%* to its definition (or value); see {yon(P42)}. LISP
has been equipped with a more elegant device for this binding,
called the %3label%* operator. It is used thus:
←%3label[%1<identifier>;<function>]
and has the effect of binding the <identifier> to the <function>.
The value constructed by executing a %3label%*-expression is
a representation of a function with name <identifier> and body <function>.
For example, a proper definition of %3fact%* is:
←%3label[fact; λ[[x][eq[x;0] → 1; %et%* → *[x;fact[sub1[x]]]]]]%1
To include %3label%* in the LISP syntax add:
.BEGIN TABIT1(11);CENTERIT;
←<function>::= %3label%*[<identifier>;<function>]
and the S-expr translation of the %3label%* construct should naturally be:
←%eR%f( %3label[f;fn] %f)%3 = (LABEL %eR%f( %3f %f)%3 %eR%f( %3fn %f)%3)%1
Note that %3label%* is a special form, not a call-by-value function.
.END
A typical application of the %3label%* construct, say
%3label[f;λ[[x]%9x%*[x]]][A]%1.
results in the following environmental picture when we get ready to evalute
%9x%3[x]%1:
.BEGIN TURN ON "\";NOFILL;TABS 5,13,28,33,36,51,56,59,64,67,70;GROUP;
\%3label[f;λ[[x]%9x%*[x]]][A]%1 \\%9x%3[x]%1
\\E%40%*\\\E%41%*
\ \ | /\\ \| E%40%*
\______________\=> ... \______
\ \ | \\%3f\| λ[[x]%9x%*[x]]
\ \ \\x\|A
.END
Notice that the definition does not appear in the global table E%40%1;
we use %3label%1 to create temporary function definitions.
These definitions disappear
when the environment in which the %3label%1 was executed is no longer
accessible to the computation.
Thus within the evaluation of the body %9x%3[x]%1
a recursive call on %3f%*
will refer to the definition of %3f%1 located
in E%41%1 so long as %3f%1 is not rebound in %9x%1;
once we have completed the computation
initialized in E%40%1 the definition of %3f%1 will disappear.
If %3f%*
is not recursive, then the use of %3label%* is unnecessary; an
anonymous function application will suffice.
.BEGIN TURN ON "#";
What about statements like "evaluate %3g[A;B]%* where
%3g#<=#λ[[x;y]#...#f[u;v]#...]%* and %3f#<=#λ[[x;y]#...#]%1."?
%3label%* defines only one function; we may not say %3label[f,g;#...#]%*.
What we %2can%* do embed the %3label%*-definition for %3f%* within
the %3label%*-definition for %3g%*⊗↓Indeed %2every%* occurrence of %3f%*
must be replaced by the %3label[f;...]%1 construct.←. Thus:
.BEGIN CENTER;SELECT 3;
label[g;#λ[[x;y]#...#label[f;#λ[[x;y]#...#]][u;v]#...]]%*
.END
Implementations of LISP offer better definitional facilities, with "<="
having the effect of permanently establishing the definition in E%40%1.
.END
The apparent simplicity of the %3label%* operator is partly due to
misconception and partly due to the restrictions placed on the current
subset of LISP. %3label%* appears to be a
rather weak form of an assignment statement. When we extend LISP to include
assignments we can show that such interpretation of %3label%* is
insufficient.
.CENT(Problems);
I. Show one way to change %3eval%* to handle %3label%*.
II. Express the definition of %3reverse%* given on {yon(P97)} using
%3label%*.
.BEGIN GROUP;
III Evaluate the following:
.SELECT 3;CENTERIT;
←λ[[y]label[fn;fn%42%*][%ef%*]] [%ef%*]
%1where:%3
←fn%42%* <= λ[[x][y → 1; x → 2; %et%* → fn%41%*[%et%*]]
←fn%41%* <= λ[[y]fn[y]]
.END
.END
.SS(Functional Arguments and Functional Values,functional argument,P76:)
.BEGIN TURN ON "#","←";
Recall our discussion of :
.BEGIN CENTERIT;SELECT 3;
←%3eval[(F#2#3);((F#.#(LAMBDA (X#Y) (PLUS#(EXPT#X#2) Y))))].%*
.END
We now know this is equivalent to:
.BEGIN CENTERIT;
←%3eval[((LABEL#F#(LAMBDA (X#Y) (PLUS#(EXPT#X#2)#Y)))#2#3);( )]%1.
.END
In either case, the effect is to bind the name %3f%* to the λ-expression.
Binding also occurs when %3f%* is called: we bind %3x%* to %32%*, and %3y%*
to %33%*.
In the latter case we are binding simple values; in the former we are binding
functions as values. We have decided that the necessary ingredients to
characterize a functional value
⊗↓It would be better
to call these constructs "procedure values" since we will
take a decidedly algorithmic interpretation of them. As we now
know, there are important differences between functions and
algorithms.←
are a representation of the formal parameters, and a representation of the
expression described in the body of the function.
In this section we will examine the adequacy of that decision.
We will proceed informally with a few examples and see what happens.
Assume we have a list %3l%* of dotted-pairs %9α%41%1#,...,#%9α%4n%1, and we
wish to form a new list of the form %3(car[%9α%41%3]#...#%3car[%9α%4n%3])%1.
That is we wish to apply %3car%* to each of the elements of %3l%*. Such a function is
easy to write:
←%3carfirst <= λ[[l][null[l] → ( ); %et%* → concat[car[first[l]];carfirst[rest[l]]]]].%1
Now suppose we wish to write a more general function, which instead
of being specific to %3car%*, will take an %2arbitrary%* unary function %3f%* and
apply it to each of the elements of %3l%*,
generating %3(f[%9α%41%3],#...,#%3f[%9α%4n%3])%1.
Such a function could plausibly be defined as follows:
.P34:
←%3mapfirst <= λ[[fn;l][null[l] → ( ); %et%* → concat[fn[first[l]];mapfirst[fn;rest[l]]]]].%1
Thus the first calculation we requested above could be expressed as:
←%3mapfirst[car;l]%1 .......or could it?
Recalling LISP's penchant for call-by-value evaluation, we should realize
that the computation would not be done as expected. We do %2not%* want the
argument %3car%* evaluated. We want its %2name%*. We have seen one artifact in LISP
to subdue evaluation: we can make it a constant by %3quote%1-ing it. Indeed,
←%3mapfirst[quote[car];l] = mapfirst[CAR;l] %2will%1 work.
You should convince yourself that %3mapfirst[CAR;l]%* will compute %3carfirst[l]%*;
that exercise requires examining the details of %3eval%1.
Before going on to more complex examples it would be well to note that
%3mapfirst%* is a different kind of LISP function from those we have seen before.
The first argument to %3mapfirst%* is expected to be a function. Notice
that the argument %3fn%* appears in the body of %3mapfirst%* in a position reserved
for functions. Thus any actual parameter bound to %3fn%* is expected to be a
representation of a function.
Such a use of a function is called a %2functional argument%*.
The trick we used above of representing the functional argument %3car%1 as a
constant %3CAR%1
can be applied to other instances of functional arguments.
Thus the functional argument:
←%3λ[[x]f[g[x]]%* could be represented as,
←%3(LAMBDA(X)(F(G X))).%*
The difficulty is that the trick⊗↓The trick is called %3QUOTE%*-ing
the functional argument since the S-expr representation of an
instance of such a construct is a %3QUOTE%1-ed expression.←
is not sufficient to capture the intended meaning in all cases.
To understand why %3QUOTE%1-ing is not sufficient we need
a slightly more complex set of examples. First we try:
←%3mapfirst[quote[λ[[x]concat[;()]]];(A B C D)]%1⊗↓Note that we
are continuing to use %3quote%1 rather than write out the representation.←
which we expect to evaluate to %3((A)(B)(C)(D)).%*
.BEGIN TURN ON "\";NOFILL;TABS 5,13,28,43,46,61,66,69,74,77,80;GROUP;
%3mapfirst[quote[λ[[x]concat[x;()]]]; ..]\[null[l] ...]%1
\\E%40%*\\\E%41%*
\ \| /\\ \| E%40%*
\______________\=>\______\=> . . .
\%3mapfirst\| λ[[fn;l][null[l]...]]\\l\| (A B C D)
\\\\fn\| quote[λ[[x]concat[x;()]]]%1
.END
Now since %3null[l]%* is false, we⊗↓When we say "we" we really mean %3eval%1.←
reduce the problem to:
.BEGIN TURN ON "\";NOFILL;TABS 25,33,48,53,56,61,76,79;GROUP;
\%3concat[fn[first[l]];mapfirst[fn;rest[l]]].%1
\\E%41%*
\ \| E%40%*
\______________
\ %3l\| (A B C D)
\ fn\|quote[λ[[x]concat[x;()]]]%1
.END
Since we're doing call-by-value we have to evaluate the arguments
to %3concat%*; that requires evaluating %3fn[first[l]]%*. The value of
%3l%* we find locally and evaluate %3first[l]%*, getting %3A%*. The value
for %3fn%* is also found locally, and since it is the representation of
a λ-definition, we set up a
new environment in which to evaluate the body of %3fn%*, binding the λ-variable
%3x%* to %3A%*:
.BEGIN TURN ON "\";NOFILL;TABS 25,33,48,53,56,71,76;GROUP;
\%3concat[x;()]%1
\\E%42%*
\ \| E%41%*
\______________
\ %3x\| A%1
.END
The expected evaluation takes place: %3(A)%* is computed and returned to
environment E%41%1 so that we may continue the evaluation %3mapfirst[fn;rest[l]]%*.
However, consider the following variant of this last example.
Define:
←%3foo <= λ[[l]mapfirst[quote[λ[[x]concat[x;l]]]; (A B C D)]%*.
It would
seem that %3foo[(#)]%* would also give %3((A)(B)(C)(D))%*
since %3l%* will be bound to %3(#)%* and therefore the %3l%* in the functional
argument will effectively be %3(#)%*.
.BEGIN TURN ON "\";NOFILL;TABS 1,9,24,29,32,47,52,55,60,63,66;GROUP;
\ %3foo[( )]\ mapfirst[quote[λ[[x]concat[x;l]]]; ..]\ [null[l] ... ]
\\%1E%40%*\\\E%41%*\\\E%42%*
\ \ | /\\ \| E%40%*\\ \| E%41%*
\______________\=>\______\=>\______ => ...
\ %3foo\| λ[[l]... \\l\| ( ) \\l\| (A B C D)
\mapfirst\| λ[[fn;l][null[l]...]]\\\ \\fn\| quote[λ[[x]concat[x;l]]]%*
.END
%3null[l]%* is false since %3l%* is %3(A B C D)%*, so we evaluate
%3concat[fn[first[l]]#...#]%*. This involves evaluating %3first[l]%*
in E%42%1, giving %3A%*. We evaluate (look up) %3fn%* in E%42%1 and,
finding a representation of a λ-definition,
we make a new environment E%43%1 in which to evaluate
the body of %3fn%*. As we make E%43%1, we add an entry binding %3x%* to %3A%*
and we settle down in E%43%1 to evaluate %3concat[x;l]%*:
.BEGIN TABIT2(25,30);GROUP;
\\%3concat[x;l]%1
\\E%43%*
\ \| E%42%*
\__________
\ %3 x\| A
.END
Since %3l%* is non-local to E%43%1, we follow the access chain to find its value
in E%42%1 to be %3(A B C D)%1. But that's not the expected value! We expected
to find %3(#)%*, which was hidden away in E%41%1.
The trouble here is that
%3l%* was rebound in the interim. The first thing to note is that the problem
is caused by free variables: %3l%* is free in the functional argument.
Local variables aren't problematic; neither are global variables.
Next note that the desired binding for %3l%* is the one which was current
when we were binding the functional argument to the formal parameter %3fn%*.
A plausible solution then is to replace all non-local variables with their values
at the time we recognize the functional argument. This will not always suffice.
See {yon(P94)} for a counterexample.
A more promising solution is to
associate the name of the current environment with the function and use
that pair as the value to be given to the formal parameter.
When we want to apply the functional argument we set up a new environment as
always, introducing a local table with the λ-variables bound to their values;
only %2now%* we use the saved environment as the beginning of the access chain.
Then the values of
any non-local variables which we encounter in the process of applying the functional
argument will be searched for in the saved environment.
To initialize this process we require the recognition of instances of functional
arguments. We introduce a new operator called ⊗→%3function%*↔←.
This operator takes one argument: a representation of the function. The effect of
%3function%1 will be to construct a value representing that argument and the
environment which was current when the %3function%1-instance was evaluated.
Thus in the example we would recognize the %3function%*-construct while
evaluating the arguments to %3mapfirst%*; the environment which was current
then was E%41%*. Therefore as we build E%42%* we want to associate the pair
%3λ[[x]concat[x;l]]#-#%1E%41%* with the formal parameter %3fn%*. Whenever
we apply %3fn%* we want to use %3λ[[x]concat[x;l]]%*; and within that
context, whenever we want %3l%*, we want the value of %3l%* in E%41%*.
The function-environment pair is called a %2⊗→closure↔←%* or
%2⊗→funarg↔←%*.
In our diagrams we will designate the pair as:
←<function>:<environment>.
Thus in the above example we should designate the value of the functional argument
as:
←%3λ[[x]concat[x;l]]%1:E%41%1.
We must also extend the manipulation of Weizenbaum environments to
handle such constructions.
The process which recognizes λ-definitions and sets up new environments must now
watch for funargs.
When it sees one it uses the associated
environment as the access environment.
Let's do the example again.
.BEGIN TURN ON "\";NOFILL;TABS 5,13,28,33,36,51,56,59,64,67,70;GROUP;
\ %3foo[( )]\ mapfirst[function[λ[[x]concat[x;l]]; ..]\ [null[l] ... ]
\\E%40%*\\\E%41%*\\\E%42%*
\ \| /\\ \| E%40%*\\\| E%41%*
\______________\=>\______\=>\______ => . . .
\ %3foo\| λ[[l]... \\l\| ( ) \\l\| (A B C D)
\mapfirst\| λ[[fn;l][null[l]...]]\\\ \\fn\| λ[[x]concat[x;l]]%1:E%41%1
.END
Things are as before except now %3fn%1 is bound to the funarg pair
in E%42%*.
We look up %3fn%* in E%42%1 and,
finding a λ-definition, we make a new environment E%43%1 in which to evaluate
the body of %3fn%*. As we make E%43%1, we add an entry binding %3x%* to %3A%*.
But now since the λ-definition is a funarg we make the access environment
E%41%1 as saved with %3fn%*.
Thus we settle down in E%43%1 to evaluate %3concat[x;l]%*:
.BEGIN TABIT2(25,30);GROUP;
\\%3concat[x;l]%1
\\E%43%*
\ \| E%41%*
\__________
\ %3 x\| A
.END
Since %3l%* is non-local to E%43%1, we follow the access chain to find its value
in E%41%1 to be %3(#)%1 as desired.
Thus instead of simply tracing back to the previous environment we detour
around E%42%*:
.BEGIN SELECT g;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
.TURN OFF "\","←";
.TABS 30,33;
?∞1E∞40∞g
?↑
?~
?∞1E∞41∞g←?⊃
?↑?.
?~?.
?∞1E∞42∞g?↑
?↑?.
?~?.
?∞1E∞43∞g→?$
.END
However, there is still some information which we must make explicit
if these Weizenbaum diagrams are to faithfully represent the process of evaluation.
Namely, after we have finished the evaluation of %3concat[x;l]%1 we are to
restore a previous environment. Which one is it? It isn't E%41%*,
it's E%42%1! That information is not available in our diagram, so
we must correct the situation.
In the left-hand quadrant of our diagram we
place the name of the environment which we wish restored when we leave the
current environment.
That environment name will be called the %2⊗→control environment↔←%*,
and will head a chain of environments, called the control chain
⊗↓In Algol, the access chain is called the static chain, and the control
chain is called the dynamic chain.←.
Here's the correct picture:
.BEGIN TABIT2(25,30);GROUP;
.P190:
\\%3concat[x;l]%1
\\E%43%*
\E%42%*\| E%41%*
\__________
\ %3 x\| A
.END
So after we have finished the computation in E%43%* we return control to E%42%*.
Thus the general structure of an environment is as follows:
.BEGIN TABIT2(33,42);GROUP;
\\%aForm%1
\\E%4current%*
\E%4control%1\| E%4access%*
\______________________________
\var\| value
\%3x%41%1\| . . .
\%3x%42%1\| . . .
\. . .\| . . .
\%3x%4n%1\| . . .
\\|
.END
Here's another example, involving a function to produce the composition
of two unary functions. We will call the function %3compose%*. The value
returned by %3compose%* will be a function. Thus %3compose%* will produce
functional values:
.BEGIN CENTERIT;SELECT 3;GROUP;
←compose[function[car];function[cdr]] = cadr
%1with a plausible definition as:%3
←compose <= λ[[f;g]λ[[x]f[g[x]]]]
.END
This definition of %3compose%* is almost right. The
value returned by %3compose%* is to be a function. Indeed it is an instance of
a %2⊗→functional value↔←%*, so, as with functional arguments, it needs to be
decorated with %3function%* so that the environment which contains the right
bindings for %3f%* and %3g%* is saved. Which environment is that? It's the
environment which will be current when the %3function%*-construct is recognized.
So we write:
.BEGIN CENTER;
.p152:
←%3compose <= λ[[f;g]function[λ[[x]f[g[x]]]]].
%1Now try: %3app[cons[A;(B . C)];compose[function[car];function[cdr]]]%1
where: %3app <= λ[[y;f]f[y]]%1
.END
As usual we evaluate the arguments to %3app%*, bind the results
to %3y%* and %3f%* and evaluate the body of %3app%*.
.BEGIN TURN ON "\";NOFILL;TABS 15,23,38,43,46;GROUP;
\%3app[cons[A;(B .C)];function[compose[function[car];function[cdr]]]]%1
\\%1E%40%*
\ \/| /
\______________
\ %3app\ | λ[[y;f]f[y]]
\compose\ | λ[[f;g]function[λ[[x]f[g[x]]]]]%1
.END
Evaluation of the first argument to %3app%* brings no surprises; we get
%3(A#.(B#.#C))%1. We begin evaluating the second argument;
we find the definition of %3compose%* in the environment and
since it is a λ-definition we set up a new environment, E%41%*, and evaluate
the body %3function[λ[[x]f[g[x]]]]%*:
.BEGIN TURN ON "\";NOFILL;TABS 25,33,48,53,56,71;GROUP;
\%3function[λ[[x]f[g[x]]]]%*
\\%1E%41%*
\ E%40%*\| E%40%*
\ __________
\ %3f\| car%1:E%40%3
\ g\| cdr%1:E%40%1
.END
Again, the recognition of the %3function%*-construct says return a funarg-pair
as value. The environment we associate is the current one, E%41%*. We now go back
to E%40%*, using the control chain. Since both arguments to %3app%* are now
evaluated, we find the definition of %3app%* and set up a new environment E%42%*.
Thus:
.BEGIN TURN ON "\";NOFILL;TABS 15,23,38,43,46,61;GROUP;
\\%3f[y]\\\f[g[x]]%1
\\E%42%*\\\E%43%*
\ E%40%*\| E%40%*\\ E%42%*\| E%41%*
\ __________\=>\_________
\ %3y\| (A. (B . C))\\x\| (A. (B . C))
\ f\| λ[[x]f[g[x]]]%1:E%41%1
.END
The form to be evaluated in E%42%* is %3f[y]%*; we find %3y%* and %3f%*
both locally. We evaluate the argument %3y%1,
then since %3f%* is a λ-definition, we set up a new environment
binding the λ-variable %3x%* to the value %3(A#.(B#.#C))%*. But the λ-definition
is also a funarg; therefore the access environment stored in E%43%* is E%41%*.
The control component of E%43%* is set to the prior environment, E%42%*; and
we begin evaluation of the body %3f[g[x]]%*.
Now in E%43%* we find %3x%* locally but have to resort to the access chain to
find %3f%* and %3g%*; using funargs, we have set up the appropriate
environments. From E%43%* we have access to E%41%*⊗↓and from there to E%40%*
if it were needed.←:
.BEGIN SELECT g;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","\";
.TURN ON "∞" FOR "%";
∞1E∞40∞G
/\
/ \
/ \
∞1E∞41∞G ∞1E∞42∞g
. \
. \
.∞1E∞43∞g
.END
The rest of the evaluation goes without a hitch: we finish the evaluation
in E%43%* and return to E%42%* and finally to E%40%* following the control
evironments.
Notice that %3f%1 and %3g%* in the body of %3compose%* are free variables
and therefore their bindings are not to be found in the local environment.
Since the interesting applications of such functions usually involve
free variables, we must deal with them.
In particular, the %3label%1 operator will typically involve free
variables. We remarked that %3f%1 in:
.BEGIN CENTERIT;
←%3f <= λ[[x][zerop[x] → 1; %et%* → *[x;f[x-1]]] ]%1
.END
is free. But we want the occurrence of %3f%1 on the right to be
synonymous with the %3f%1 being defined on the left. We can do this
by "tying a knot" in the access environment chain. Thus we should modify the
diagram for %3label%1 to be:
.BEGIN TURN ON "\";NOFILL;TABS 5,13,28,33,36,51,56,59,64,67,70;GROUP;
.P192:
\%3label[f;λ[[x]%9x%*[x]]][A]%1 \\%9x%3[x]%1
\\E%40%*\\\E%41%*
\ \ | /\\ \| E%40%*
\______________\=> ... \______
\ \ | \\%3f\| λ[[x]%9x%*[x]]%1:E%41%3
\ \ \\x\|A
.END
.P94:
Notice that the effect of %3label%1 is to build %3function[λ[[x]%9x%3]]%1
and assocate that with %3f%1.
If we attempted to implement %3function[%9f%3]%1 by replacing all non-local
variables in %9f%1 by their current values we wouldn't always get what we expect.
.BEGIN CENTERIT;
Consider ←%3 fact <= λ[[x][x=0 → 1; %et%* → *[x;fact[x-1]]]]%1:
.END
.BEGIN TABIT3(30,34,38);GROUP;TURN OFF "←";
If the current environment is E%4i%1:
\\E%4i%*
\E%4c%*\|\E%4a%*
\___________
\%3fact%*\|%3foo%*
.APART
.GROUP
then executing %3fact <= ... %* should give something like:
\\E%4i%*
\E%4c%*\|\E%4a%*
\___________
\%3fact%*\|%3λ[[x] ...fact[x-1]] : %1E%4i%*
.APART
.GROUP
rather than:
\\E%4i%*
\E%4c%*\|\E%4a%*
\___________
\%3fact%*\|%3λ[[x] ...foo[x-1]]
.END
Since every language construct in LISP must have an S-expr representation
we must include the %3function%1-construct. Its translation scheme is simple:
represent %3function[%9x%3]%1 as %3(FUNCTION %eR%f(%9x%f)%3)%1.
.GROUP;
Thus:
.BEGIN CENTERIT
←%3function[λ[[x]f[g[x]]] %* has an %eR%*-image of,
←%3(FUNCTION(LAMBDA(X)(F(G X)))).%*
.END
.APART
Similarly we must develop parts of %3eval%1 to deal with %3FUNCTION%1.
The device LISP used to associate environments with functions is called
the ⊗→%3FUNARG%*↔← device. (More is said about implementations of %3FUNARG%* in {yonss(P3)}.)
When %3eval%* sees the construction %3(FUNCTION %1fn%3)%* it returns as value
the list:
←%3(FUNARG###%*fn### current-symbol-table%3)%*.
When %3eval%*, or actually %3apply%* sees %3(FUNARG %*fn st%3)%*, that is,
when we are %2calling%3 fn%1, we use the symbol table %3st%*, rather than
the current symbol table for accessing free variables.
.P215:
Thus there are %2two%* environments involved in the proper handling of functional
arguments. First there is the environment which is saved with the %3FUNARG%*.
This is called the %2⊗→binding environment↔←%* since it is the
environment current at the time the functional argument was constructed
or bound. The second environment, called the %2⊗→activation environment↔←%*,
is the environment which is current when the functional argument is %2applied%*
or activated. Thus the activation environment is used to locate local
variables, but if a non-local variable is needed then the binding
environment is selected.
It is the duty of %3eval%* and %3apply%* to use the %3FUNARG%* device
to maintain the proper control of the activation and binding environments.
.<<pic of funarg execution>>
.P79:
.GROUP;
Finally, we should update our description of the usage
of Weizenbaum environments given on {YON(P189)}:
.BEGIN INDENT 5,5;
When the %3function%1 construct is recognized, we manufacture a %3FUNARG%1 triple
consisting of the atom %3FUNARG%1, the function described in the
instance of %3function%1, and the current environment.
This triple is the value of the %3function%1 construct and may thus be bound
to any LISP variable; typically the LISP variable will appear in an expression
in a position reserved for functions.
When doing a λ-binding, set up a new E%4new%* with
the λ-variables as the local variable entries and the values of the
arguments as the corresponding value entries.
The control slot of the new E%4new%* always points
to the previous symbol table. The access slot also points to the previous
environment unless the function being applied is a %3FUNARG%1.
If it %2is%1 a %3FUNARG%1, then set the access slot to the environment
which was saved with the %3FUNARG%1.
The evaluation of the body of the
λ- expression takes place using E%4new%1; when a local variable
is accessed we find it in E%4new%*; when a non-local variable occurs, we
chase the access chain to find its value.
When the evaluation of the body is completed,
the previous environment is restored.
E%4new%* disappears unless the value of the computation is a functional
value.
.END
.APART
Notice that there is a certain asymmetry about access and control.
The control slot always points at the previous environment, while the access
slot may vary. It may follow control, as is the case on simple function calls;
it may point to an environment further down the control chain, as is the case for
functional arguments; it may point to an environment which control cannot
return to, as is the case for functional values; or it may point to itself
as is the case for %3label%1's implementation.
There is another asymmetry in the properties of access and control.
The access environment is a self-sufficient data structure; it can be described
and manipulated as such using the usual constructors, selectors, and recognizers.
Typically such environments come into existence as a part of a computation; they
are constructed during the λ-binding process.
We can implicitly
save such an environment through the %3FUNARG%1 device; and
we can explicitly build
such environments using the data structure operations and pass them to %3eval%1
as a symbol table. But symbol tables are independent of the method used to create
them. In particular, once a table has been captured by a %3FUNARG%1 we need
not retain any information about the computation which created that table.
However the idea of "control" and "state of computation" is integrally tied
to access structure.
The state of the computation involves the expression currently being evaluated,
the history of those computations which are suspended and waiting for the
completion of the current computation, and it also involves the access evironment
since that is necessary for the crrect evaluation of variables.
To "save the state of computation" implies saving
the partial computation to that point, saving the expression being evaluated, and
saving the current access environment.
Thus to a large extent "control environment" is a misnomer. What we are intending
to capture is the idea of a suspended computation: suspended until the
subsidiary computation has been completed. Part of the suspended computation
%2is%1 the "control environment", but there's more. The Weizenbaum diagrams
show part of the information; they show the environments and the expressions
being evaluated. However they leave implicit the dynamics of the computation:
which argument is being evaluated, and where are the partial results being
stored, and where in the expression we are to continue when the subsidiary
computation is completed. In {yonss(P187)} we will develop a different
%3eval%1 family which will make much of this information explicit.
Also in {yonss(P187)} we will examine the possibility of expanding
the behavior of control
slots. That is, allowing environments other than the predecessor to appear in the
control slot of an environment.
What does all this say about functions? We have already remarked that functions are
parametric values; to that we must add that functions are also tied to the environment
in which they were created; they cannot be evaluated %3in vacuo%*.
.P89:
What does this say about "<="? It %2still%* appears to act like an assignment statement.
It is taking on more distinct character since it must associate environments with
the function body as it does the assignment.
The correct implementation of the semantics of %3function%1
seems like a lot of work to allow a moderately obscure construct to appear in a language.
However constructs like functional arguments appear in several programming languages
under different guises. Usually the syntax of the language is sufficiently
obfuscatory that the true behavior and implications of
devices like functional arguments is misunderstood. Faulty implementations
usually result. In LISP the problem %2and the solution%* appear
with exceptional clarity⊗↓Indeed, LISP was the first language to allow functional
values.←.
Here is a sketch of the abstract structure of the current %3eval%*.
.BEGIN SELECT 3;GROUP;TABIT1(12);
.P207:
eval <= λ[[exp;environ]
\[isvar[exp] → lookup[exp;environ];
\ isconst[exp] → denote[exp];
\ iscond[exp] → evcond[exp;environ];
\ isfun[exp] → makefunarg[exp;environ];
\ isfunc+args[exp] → apply[func[exp];evlis[arglist[exp];environ];environ]] ]
.APART
.GROUP
%1where:%*
apply <= λ[[fn;args,environ]
\[isfunname[fn] → ...;
\ islambda[fn] → eval[body[fn];mkenv[vars[fn];args;environ]];
\ isfunarg[fn] → apply[func%41%*[fn];args;evn[fn]];
\ ... ... ]]
.APART
.END
The reader is encouraged to complete the definitions, supplying appropriate
constructors, selectors and recognizers.
Now for some specific examples.
In particular, most implementations of LISP include a very useful class
of ⊗→mapping functions↔←.
.BEGIN INDENT 0,10;
⊗→%3maplist%*↔← is a function of two arguments, a list %3l%* and %3fn%1,
a functional
argument. %3maplist%* applies the function %3fn%* (of one argument)
to the list %3l%* and its tails (%3rest[l], rest[rest[l]],#..%1)
until %3l%* is reduced to %3(#)%*.
The value of %3maplist%* is the list of the values returned by %3fn%*.
Here's a definition of %3maplist%*:
.END
.GROUP;
←%3maplist <= λ[[fn;l][null[l] → ( ); %et%* → concat[fn[l];maplist[fn;rest[l]]]]]%1.
Thus:
%3←maplist[function[reverse];(A B C D)] = ((D C B A)(D C B)(D C)(D))%*. ⊗↓%3quote%1-ing
%3reverse%1 would also work since %3reverse%1 uses no free variables.←
.APART;
An interesting and non-trivial use of functional arguments
is shown on {yon(P136)} where we define a new control structure
suitable for describing algorithms built to operate on lists.
.CENT(Problems)
I. What changes should be made to the LISP syntax equations to
allow functional arguments?
.P151:
II. Use %3app%* on {yon(P152)} to define a function which computes factorial without
using %3label%* or explicit calls on the evaluator.
III. Extend %3eval%* and friends to handle functional arguments.
.GROUP;
.P99:
IV. An interesting use of functional arguments involves ⊗→self-applicative functions↔←.
An application of a function %3f%* in a context %3f[...;f;...]%* is an instance
of self application ⊗↓provided the designated argument position is a functional
argument.←. Self-applicative functions can be used to define recursive
functions in such a way that the definition is not %2statically%* self-referential,
but is %2dynamically%* re-entrant.
For example, here is our canonical example, written using a self-applicative function:
.APART
.BEGIN CENTER;SELECT 3;GROUP;
fact <= λ[[n]f[function[f]; n]]
f <= λ[[g;n][n=0 → 1; %et%* → *[n; g[g; n-1]] ]]
.END
Use Weizenbaum's environments to show the execution of %3fact[2]%*.
.END
.SS(Binding strategies and implementations,binding strategy,P134:)
After the discussion of variables in {yonss(P135)} and the
intervening discussions of environments, it should now be clear
that the root of the binding problem is free variables. We would rather not
outlaw free variables; many interesting recursive algorithm have free variables.
We don't want to restrict the use of free variables too precipitously since they
are a very useful programming technique. For example, the possible alternative
of passing all global information through as extra parameters in calling
sequences is overly expensive⊗↓Though much of that expense can be mitigated
by a clever compiler.←.
Handling of free variables varies from programming language to programming
language. The solution advocated by Algol-like languages is called %2⊗→static
binding↔←%* and dictates that all non-local references be fixed in the binding
environment; thus free variables aren't really free in the sense that
we have a choice to make.
LISP at least gives you a choice. Using %3quote%* you will get
the dynamic binding on free variables in a functional argument; using
%3function%* gives the static interpretation⊗↓A case can be made for even more
flexibility in the interpretation of free variables. We could ask that
the binding be done on a per variable basis.
That is we could declare which free variables are to be captured
statically and which are to be captured dynamically.
We could also ask that %2both%1
bindings be available and supply selectors which would access either the
dynamic or the static binding.←.
There are no questions about Algol's interpretation of functional values:
the construct is not allowed. When we discuss implementation of binding
strategies in {yonsec(P188)}
we will see why.
The binding strategy determines %2when%1 the variables will receive values; the
implementation determines %2how%1 those bindings are to be accomplished and
therefore, how the lookup of values is to be accomplished.
We have seen one implementation in the %3assoc#-#pairlis%1 pair
({yonss(P6)},#{yon(P213)}),
and on {yon(P223)} suggested a related implementation using Weizenbaum
environments. We propose to examine another implementation.
The most general environment structure which LISP creates is a tree
of local symbol tables, rooted in the global table. The typical LISP
computation generates a single branch, but functional arguments and values
can generate several branches. Locating a variable %3n%1 involved
searching the current branch from tip to root, looking for the first occurrence
of %3n%1. If %3n%1 was bound very deep, the search could be long.
Indeed the time is proportional to the depth of the branch. It has
been noted ⊗↑[Wegb#75]↑ that variables tend to be rebound rather seldom;
there are few occurrences of any given variable on any particular branch.
If this is the case, then the search will examine many environment blocks
which do not contain the desired variable. If the number of bindings of
any variable is small compared to the number of environment blocks
which have to be searched to find those bindings, then we would like a viable
alternative to the %3assoc#-#pairlis%1 implementation of %3lookup#-#mkenv%1.
That is, a scheme whose search is proportional to the number of
bindings for a variable, rather than proportional to the depth of the
tree.
There is such an alternative.
.P217:
Namely, we associate %2all%1 the values possessed by %3n%1 with %3n%1 itself.
To signify which environment was current when each binding was made, we
associate pairs consiting of the value and the environment name.
Thus the new
%3mkenv[vars;vals;env]%1 application must name a new environment (call it %3new%1);
attach %3new%1 to the tip of the
current branch in the environment tree. Also,
for each entry in %3vals%1, %3mkenv%1 must
associate a value-%3new%1 pair with each name in %3vars%1.
The %3lookup%1 procedure
is given the name %3n%1 and a branch in the environment tree.
Assume that the tip node of the tree is named Env.
If %3n%1 has an attached value pair whose environment component is Env,
then the associated value is returned by %3lookup%1. Otherwise
the environment branch is searched recursively by %3lookup%1
looking for the first node in that branch which has an associated value
attached to %3n%1.
.BEGIN GROUP;
Here's a graphical description of this reorganization of our symbol tables:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.BEGIN CENTER;SELECT 3;
∞2 The ∞3assoc-pairlis ∞2structure
.END
E1 E2 E4
⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπααα⊃
~ x ~ 3 ~ ~ y ~ 4 ~ ~ x ~ C ~
εαααβαααλ %αααβααα$ %αααβααα$
~ y ~ 1 ~ ~ ↓
εαααβαααλ ↓ E3
~ z ~ A ~ ~ ⊂αααπααα⊃
%αααβααα$ ~ ~ z ~ 4 ~
~ ~ %αααβααα$
↓ ↓ ↓
~ E0 ~
%ααααα→αααα→⊂αααπααα⊃←αααα←αααα$
~ y ~ 2 ~
εαααβαααλ
~ z ~ 1 ~
%ααα∀ααα$
.BEGIN CENTER;SELECT 2;
The new structures
.END
x y z
E1 E4 ⊂ααααπααα⊃ ⊂ααααπααα⊃ ⊂ααααπααα⊃
↓ E2 ↓ ~ E1 ~ 3 ~ ~ E0 ~ 2 ~ ~ E0 ~ 1 ~
\ ~ E3 εααααβαααλ εααααβαααλ εααααβαααλ
\ ↓ / ~ E4 ~ C ~ ~ E1 ~ 1 ~ ~ E1 ~ A ~
\~/ %αααα∀ααα$ εααααβαααλ εααααβαααλ
↓ ~ E2 ~ 4 ~ ~ E3 ~ 4 ~
E0 %αααα∀ααα$ %αααα∀ααα$
.END
.APART
.GROUP
Using the discussion of %3mkenv%1 on {yon(P212)}, we define:
.SELECT 3;
.BEGIN CENTER;
.P218:
alloc <=λ[[x] gensym[]]
link <= λ[[block;env] concat[block;env]]
send <= λ[[n;v;blk] addval[n;mkent[first[blk];v]]]
.END
.BEGIN FILL;SELECT 1;
The function %3gensym%1 is to generate a new environment name. Armed with
that, %3alloc%1 makes a new tip node on the end of the current environment
branch.
The function %3addval%1 adds a new entry to the variable %3n%1. The value of
%3addval%1 is %3env%1.
.END
.APART;
.GROUP;
.BEGIN SELECT 1;FILL;
The new %3lookup%1 is more complicated than the simple %3assoc%1. Given a node
in the environment branch, we must see if there is a related binding
for %3n%1. If there is, then that's the binding we want. If no binding is found
we look at the next deeper node on the tree, and check its bindings.
.END
.tabit1(18);
lookup <= λ[[n;env] λ[[z]look%9'%3[z;z;env]][getval[n]] ]
look%9'%3 <= λ[[l;l1;env]\[null[l] →look%9'%3[l1;l1;rest[env]]
\ eq[name[first[l]];first[env]] → value[first[l]]
\ %et%3 → look%9'%3[rest[l];l1;env] ]]
.END
This new scheme is called %2⊗→shallow binding↔←%1, and the %3assoc-pairlis%1
scheme is called %2⊗→deep binding↔←%1.
The essential differences between these two binding implementations is that
a deep binding search is keyed on the name of the variable, whereas a shallow
binding strategy is keyed on the environment name. These requirements
have corresponding implications for the
organization of the symbol tables.
We will discuss the relative merits of these implementations in more
detail in {yonsec(P188)}.
.CENT(Problems)
I. Suggest an implementation of %3addval%1 which will improve the search efficiency
of %3lookup%1.
.P221:
II. Analyze %3lookup%1 in a manner similar to that performed
on %3mkenv%1 on {yon(P212)}. Identify the parts of %3lookup%1 which are
independent of the binding implementation. Rewrite the shallow and deep
versions of %3lookup%1 in this more general setting.
.SS(Special Forms,Special Form,P8:)
We have remarked that the evaluation scheme for LISP functions is
call-by-value and, for functions with multiple arguments, left-to-right
evaluation of arguments. We have also seen, in %3quote%* and %3cond%*,
that not all forms to be evaluated in LISP fall within this category.
We have already noted on {yon(P44)} that %3QUOTE%* and %3COND%* are
%2not%* translations of functions in the LISP sense.
Indeed the purpose of %3quote%* was to %2stop%* evaluation.
Also, the "argument list" to %3cond%* is handled differently; its evaluation was
handled by %3evcond%*.
Since %3quote%1 and %3cond%1 were rather anomalous
we have called them %2special forms%*. However, now
we would like to discuss special forms as a generally useful technique.
Consider the predicates ⊗→%3and%*↔← and ⊗→%3or%*↔←. We might wish to define
%3and%* to be a binary predicate such that %3and%* is true just in case
%2both%* arguments evaluate to %et%*, and define %3or%* to be binary
and false just in case both arguments evaluate to %ef%*.
Notice two points. First, there is really no reason to restrict these predicates
to be %2binary%*. Replacing the words "binary" by "n-ary" and "both" by "all"
in the above description has the desired effect.
Second, if we evaluate the arguments to these predicates in some order,
say left-to-right, then we could immediately determine that %3and%*
is false as soon as we come across an argument which
evaluates to %ef%*; similarly a call on %3or%* for an arbitrary number
of arguments can be terminated as soon as we evaluate an argument giving
value %et%*.
But if we insist that %3and%* and %3or%* be LISP %2functions%* we can
take advantage of neither of these observations.
Rather we will
define %3and%* and %3or%* as special forms and handle
the evaluation ourselves. Presently, the
only way to handle special forms is to make explicit modifications
to %3eval%*.
On {yon(P201)} and in {yonss(P202)} we will discuss
simple ways to add such forms without modifying %3eval%1, and in {yonss(P30)}.
we will discuss efficiency considerations.
.BEGIN CENTERIT;SELECT 3;GROUP;TABIT1(17);
%1Recognizers for the predicates must be added to %3eval%*:
.SELECT 3;
←isand[e] → evand[args[e];environ];
←isor[e] → evor[args[e];environ];
←%1where:%3
.P53:
evand <= λ[[l;a]\[null[l]→ %et%*;
\ eval[first[l];a] → evand[rest[l];a];
\ %et%* → %ef%*] ]
evor <= λ[[l;a]\[null[l] → %ef%*;
\ eval[first[l];a] → %et%*;
\ %et%* → evor[rest[l];a]] ]
.END
Notice
the explicit calls on %3eval%*⊗↓Also notice that the abstract versions of
%3evand%* and %3evor%* know that the arguments are also presented as a sequence.
The structure of the recursion implies a left-to-right evaluation.←.
This is expensive, but cannot be helped.
Later we will show a less costly way to handle those "non-functions" which
have an indefinite number of arguments, all of which are
to be evaluated (see {yonss(P57)} on macros).
.CENT(Problems)
I What is the difference between a special form and call-by-name? Can call-by-name
be done in LISP (without redefining %3eval%*)?
.P71:
II %3select%* is a special form to be called as:
%3select[q;q%41%*;e%41%*;#...#;q%4n%*;e%4n%*;e]%1 and to be evaluated as follows:
%3q%* is evaluated; the %3q%4i%1's are evaluated from left to right until
one is found with the value of %3q%*. The value of %3select%* is the value
of the corresponding %3e%4i%1. If no such %3q%4i%1 is found the value of
%3select%* is the value of %3e%1. %3select%* is a precursor of the
%2⊗→case statement↔←%*; see {yon(P70)}.
Add a recognizer to %3eval%* to handle %3select%* and write a function to
perform the evaluation of %3select%*.
.SS(The %3prog%*-feature,,P37:)
.TURN ON "←";
%1
Though recursion is a significant tool for constructing LISP programs,
there is another technique for defining algorithms in LISP. It is
an iterative style of programming which is called the
⊗→%3prog%*↔← or %3prog%1ram feature.
Many algorithms present themselves more naturally as iterative
schemes.
Recall the recursive algorithms %3length%1 and %3length%41%1 given
on {yon(P19)}. These algorithms computed the length of a list.
Compare those schemes with the following:
%21.%* Set a variable %3l%1 to the given list. Set a variable %3c%1 to zero.
%22.%* If the list is empty, return as value of the computation, the
current value in %3c%1.
%23.%* Otherwise, increment %3c%1 by one.
%24.%* Set %3l%1 to the %3rest%* of %3l%1.
%25.%* Go to line 2.
Here is a LISP %3prog%1 version of the algorithm:
.BEGIN TABIT2(15,17);SELECT 3;GROUP;TURN OFF "←";
.P45:
⊗→length↔← <= λ[[x]prog[[l;c]
\\l ← x;
\\c ← 0;
\a\[null[l] → return[c]];
\\c ← c+1;
\\l ← rest[l];
\\go[a]] ]
.END
We have introduced several new symbols, formats, and functions
in this example. These innovations must be explained before
the example is complete. First,
the basic syntax of a %3prog%1 is given by:
.BEGIN tabit2(15,23);group;turnoff "←";
<prog>\::= %3prog%1[[<prog variables>]<prog body>]
<prog body>\::= <prog element><prog body> | <prog element>
<prog element>\::= <label> | <prog form>;
<label>\::= <identifier>
<prog form>\::= <application>
\::= <conditional statement>
\::= <assignment statement>
\::= <return statement>
\::= <go statement>
.END
.BEGIN tabit2(15,23);group;turnoff "←";
<conditional statement>\::= <conditional expression>
<assignment statement>\::= <identifier> ← <form>
<return statement>\::= %3return%1[<form>]
<go statement>\\::= %3go%1[<form>]
.END
The ⊗→%3prog%* variables↔←, %3l%* and %3c%1, in the example,
are ⊗→local variable↔←s. They act just like λ-variables with an implied
initialization to %3( )%1.
Thus %3prog%1s can be used recursively.
The %3prog%1 body is a sequence of %3prog%1#forms and labels.
Each %3prog%1#form is evaluated in the usual LISP manner, and since
the %3prog%1#body can consist of a sequence of %3prog%1#forms, the
%3prog%1-body is evaluated from left-to-right.
If the intent of the %3prog%1 was simply to execute the sequence of
%3prog%1#forms, in left-to-right order, then %3prog%1 could be replaced by
a much simpler construct like %3progn%1:
.BEGIN CENTERIT;SELECT 3;
←%3progn <= λ[[x%41%3; ...; %3x%4n%3] %3x%4n%3]
.END
However we will add constructs to LISP which will allow us to
vary the flow of control within the %3prog%1#body. It is to this
end that we use labels. Before we discuss labels and their associated
operations, we wish to discuss the more general character of %3prog%1#forms.
In LISP, every form returns a value; programming languages also have constructs
which are primarily executed for their effect rather than their value.
Such constructs are called %2⊗→statement↔←s%1. For example,
LISP implementations include a unary primitive named %3print%1 whose effect is to
print the value of its argument on the current output device. This
function also returns its evaluated argument as the value of the %3print%1
statement. Thus mathematically, %3print%1 acts like an identity function,
but its execution certainly affects the programmer's environment.
Operations which can have such non-local effects are said to have
%2⊗→side-effects↔←%1⊗↓Whether
the act of printing is a side-effect or the fact that %3print%1
returns a value is a side-effect depends on your point of view.←.
There is another common and useful programming construct
we wish to introduce into %3prog%1s which has side-effects.
It is called the %2⊗→assignment statement↔←%1.
As with all LISP
constructs, the assignment returns a value, but we identify it as a statement
since it is executed more for effect than for value.
In our example of %3length%1, we used an assignment to bind %3l%1 to the
value of %3x%1 and to bind %3c%1 to %30%1.
To evaluate an assignment, we first evaluate the form; then the identifier
is located by searching the access chain. Thus the identifier may be a
non-local variable. When the identifier is located its current value is replaced
by the value of the form. Notice that this is a different kind of binding than
that previously done by λ-binding. In λ-binding we always associated a new
value with a newly created local symbol table as we entered the λ-body.
We never changed the binding of a variable, though we could achieve the effect
by rebinding the variable. The semantics of the assigment involve changing the
binding. Thus assignments to non-local variables can have effect outside the
%3prog%1. Assignment statements therefore have a side-effect.
An assignment statement also has a value. Its value is the value of the form
on the right-hand-side.
As we intimated earlier, %3prog%1 introduces some new control structures
so that the %3prog%1#body need not be executed in simple left-to-right order.
The control structures are: the conditional statement, the %3return%1 statement,
and the %3go%1 statement.
Though conditional statements in %3prog%1s have the same syntax as conditional
expressions, their semantics is slightly different. A conditional
statement is executed in the usual manner unless none of the predicate
alternatives is satisfied. Recall that a conditional expression is undefined
in this case; a conditional statement however is defined, returns %3(#)%1,
and executes the next statement in the %3prog%1#body.
Thus in our %3length%1 example:
.BEGIN CENTERIT;SELECT 3;
←[null[l] → return[c]],
.END
if %3l%1 is not empty
the %3prog%1#body continues at the next statement with the assignment
of %3rest[l]%1 to %3l%1. If %3l%1 %2is%1 empty, then the statement %3return[c]%1
is executed.
The %3return%1 statement is a %3prog%1 construct similar in effect to
exiting a λ-expression. It is used to leave a %3prog%1#body and return to
the caller of the %3prog%1. As we leave the %3prog%1, the bindings of the
%3prog%1#variables are removed as are the λ-bindings made on entry to the %3prog%1.
The value returned is the value of the argument
to the %3return%1 statement. The %3return%1 statement may be nested within
other LISP computation, as for example:
.BEGIN CENTER;SELECT 3;
concat[A;return[list[B]]].
.END
However the evaluation of the %3return%1 is to take effect immediately;
the %3concat%1 would never complete its operation. We would return %3(B)%1
to the caller of the enclosing %3prog%1. A bit of care is needed in describing
the meaning of %3return%1: we look for the latest instance of an entrance to
a %3prog%1 and return from that %3prog%1. The simplest way to visualize this
is to use Weizenbaum environments#({yon(P190)}). We search the %2control chain%1,
looking for the first %aForm%1 which is %3prog[[#...#]#...#]%1. We then restore
using the access and control information found in that diagram.
We will give a comprehensive example after discussing the %3go%1 statement.
.P83:
The %3go%1 statement is used in conjunction with labels to divert the
implied left-to-right execution of the %3prog%1#body.
Labels really aren't executed; they are used to name statements in a %3prog%1.
It is the %3go%1 statement which uses the label as a destination for transferring
control.
Labels may conflict with the
λ-variables or %3prog%*#variables since the
evaluator for %3prog%*s can resolve the
conflicts by context. Any identifier occurring by itself in a
%3prog%1#body is a label. Any identifier occurring in an application other than
a %3go%1#statement is a variable and its value is searched for in the access chain,
whereas an identifier appearing in a %3go%1 statement is interpreted as a label and
searched for in a %3prog%1 body.
The ⊗→%3go%*↔← statement is a little more complicated than the %3return%1 statement.
If the argument
to %3go%* is an identifier then it is interpreted as a label;
otherwise, the argument
is %2evaluated%* and the result of the evaluation is interpreted as a label.
Once a label has been uncovered as the result of this evaluation we must locate a
statment in a %3prog%1 which has that label attached to it. Our intention
is to transfer control to that statement.
We locate the labeled statement as follows:
we look back through the control chain for the first %3prog%1 which contains
the label. When the label is found we transfer control to that labeled
statement, restoring the access and control environments of the %3prog%1
which contain that statement. Thus there is a double search involved:
we search the control chain for %3prog%1 forms, and search the %3prog%1 forms
for the label.
Labels need not be local; we find the closest dynamically surrounding
%3prog%1 which contains the label.
Notice that %3go%1 and %3return%1 are the first constructs we have seen
whose behavior on completion does not imply that we restore the previous
control environment.
Finally, as an example covering the new features of %3prog%1 consider:
.BEGIN SELECT 3;tabit3(20,33,35);GROUP;turn off "←";
\f <= λ[[y;z]prog[[l;x]
\\\l ← 2;
\\l\u ← g[y;x;z]
\\\ ... ]]
\g <= λ[[x;y;z]prog[[ ]
\\\ ...
\\\go[l]
\\\ ...
\\\return[first[x]]
\\\ ... ]]
.END
Notice in %3f%1, %3l%1 is both a label and a %3prog%1#variable.
Notice in %3g%1 that we have no %3prog%1 variables; and since we assume
that %3l%1 is not a label in %3g%1 we have a non-local %3go%1.
Consider the evaluation of %3f[(A B);3]%1.
.BEGIN TURN ON "\";NOFILL;TABS 9,14,24,36,41,51,56,61,72,74;GROUP;
.TURN OFF "←";
\\%3f[(A B);3]\\\prog[[l;x]...]\\\[l ← 2; l u ← g[y;x;z];...]
\\%1E%40%*\\\E%41%*\\\E%42%*
\ /\| /\\ E%40%1 \| E%40%*\\ E%41%1\| E%41%*
\_______\=>\_______\=>\_______ =>
\%3 f \| λ[[y;z]prog[...]]\ y\| (A B)\\ l\| ( )
\%3 g \| λ[[x;y;z]prog[...]]\ z\| 3\\ x\| ( )
.END
.BEGIN TURNOFF "←";
.group
At this point we have done the λ-binding and initialized the %3prog%1#variables.
As we begin the execution of the %3prog%1#body, we assign %32%1 to %3l%1 and,
since labels have no computational effect, begin the evaluation of the assignment
statement: %3u#←#g[y;x;z]%1:
.END
.BEGIN TURN ON "\";NOFILL;TABS 15,23,38,43,46,61;GROUP;turn off "←";
\\%3[... l u ← g[y;x;z];...]\\\[...u ← g[y;x;z]; ...]
\\E%42%*\\\E%42%*
\ E%41%*\| E%41%*\\ E%41%*\| E%41%*
\ __________\=>\_________
\ %3l\| 2 \\l\| 2
\ x\| ( ) \\x\| ( )
.END
.apart
.GROUP
We evaluate %3g[y;x;z]%1:
.BEGIN TURN ON "\";NOFILL;TABS 15,23,38,43,46,61;GROUP;
\\%3prog[[ ] ...]\\[...go[l]; ... return[first[x]]; ...]
\\E%43%*\\\E%44%*
\ E%42%*\| E%42%*\\ E%43%*\| E%43%*
\ __________\=>\_________
\ %3x\| (A B)
\ y\| ( )
\ z\| 3
.END
.APART
The %3go[l]%1 will search the control chain; it looks at the %3prog%1 of
E%43%1 but finds no label %3l%1. It finds the %3prog%1 of E%41%1 next, and there it does
find the label %3l%1. Thus execution would be continued in E%42%1, the
environment which bound the %3prog%1#variables, at
the assignment statement.
In general, we continue in the environment which was created on entry to the
%3prog%1#body.
Notice that once we have left E%44%1 there is no
way to jump back into it.
We can only search down the control chain, and the entry to %3g%1 is not below
that of %3f%1 on that chain.
An extension of the semantics of LISP could allow such generalized control
and we will develop some of those ideas in {yonss(P187)}.
.BEGIN TURNOFF "←";
If we executed the %3return[first[x]]%1 in E%44%1 an action similar to that of %3go%1
would transpire. We would evaluate %3first[x]%1, getting %3A%1. We would search the
control chain for the %2latest%1 %3prog%1#expression; here found in E%43%1;
and then return control to the environment designated in the control
quadrant; here E%42%1. Thus we return
%3A%1 as the value of %3g[y;x;z]%1. Since the call on %3g%1 was a component of
the assignment %3u#←#g[y;x;z]%1, we must complete that assignment. We search the %2access%1
chain for %3u%1. Since %3u%1 is not found we make a global assignment
in E%40%1:
.END
.BEGIN TURN ON "\";NOFILL;TABS 25,33,48,53,56,71,76;GROUP;
\\E%40%*
\ \/| /%*
\______________%3
\ %3f\| λ[[y;z] ...]
\ %3g\| λ[[x;y;z] ...]
\ %3u\| A
.END
The ability to evaluate the argument to %3go%1 results in
a useful programming trick. Let %3l%* be a list of dotted pairs,
each of the form, %3(%* object%4i%* . label%4i%3)%1. At each label%4i%* we
begin a piece of program to be executed when object%4i%* has been recognized.
Then the construct:
.P25:
%aUGH%*←%3go[cdr[assoc[x;l]]]%*
can be used to "dispatch" to the appropriate code when %3x%* is one of the
object%4i%*. This is an instance of %2⊗→table-driven↔←%* programming.
The blocks of code dispatched to can be distributed throughout the body of the
%3prog%*. Each block of code will usually be followed by a %3go%* back to
the code involving equation %aUGH%* (above). In fact the argument %3l%* in %aUGH%*
may be %2global%* to the %3prog%*-body.
The effect is to make a %3prog%* which is very difficult to understand.
The LISP %3select%* ({yon(P71)}) will handle many of the possible applications
of this coding trick and result in a more readable program. The case-statement ({yon(P70)})
present in some other languages is also a better means of handling this problem.
The %3go%1 statement is useful if used with discretion. It is a
building block for constructing more complex control regimes, particularly since
the label need not be local to the %3prog%1. We will examine some more complex
kinds of control behavior in {yonss(P187)}.
Now to the problem of translating a %3prog%* into an S-expression representation:
the construct,
←%3prog[[v%41%3; ...; v%4n%3] ... ]%1 will be translated to:
←%3(PROG(V1 ... VN) ... )%*.
Notice that %3prog%* is a special form.
So the body of the %3prog%* must be handled specially by a
new piece of the evaluator.
.TURN OFF "←";
Similarly we must be careful about the interpretation of ←.
We will write %3x ← y%* in prefix form: %3setq[x;y]%*.
We will map this to:
.BEGIN CENTER
%3(SETQ X Y).%*
.END
Notice that %3setq%1 is also a special form.
For if %3x%* and %3y%* have values %32%* and %33%*, for example, then the
call-by-value interpretation of %3setq[x;y]%* would say %3setq[2;3]%*. This was not our
intention. We want to evaluate the second argument
to %3setq%1 while stopping the evaluation of the first argument.
LISP has another assignment-like operator called %3set%1.
Both arguments of this binary operator are evaluated; the value
of the first argument is expected to be a representation of a variable;
that is, the first argument evaluates to a literal atom.
The second argument is a LISP form
and using the value of that form, an assignment is made to the
variable represented by the first argument. Thus we could define
%3setq%1 as:
.BEGIN CENTER;select 3;
%3setq[x;y] = set[quote[x];y]
.END
.TURN ON "←";
As a more complex example, consider %3set[z; plus[x;1]]%1.
If the current value of
variable %3z%* is an identifier, then
%3set[z; plus[x;1]]%* makes sense. Assume the current value of %3Z%*, the
representation of %3z%1, is %3A%*; and assume the current value of %3x%1 is %32%1;
then
the effect of the %3set%* statement is to assign the value %33%* to %3a%*.
Normally when you are making assignments, you want to assign to a %2name%* and
not a %2value%*; thus you will tend to use the %3setq%1 form.
Finally, here is a translation of the body of the %3prog%* version of %3length:%*
.GROUP
%3
.BEGIN TABIT3(10,17,20);
\(LAMBDA (X)
\\(PROG (L C)
\\\(SETQ L X)
\\\(SETQ C 0)
\\A\(COND ((NULL L) (RETURN C)))
\\\(SETQ C (ADD1 C))
\\\(SETQ L (REST L))
\\\(GO A) ))
.END
.APART
%1
Now that assignment statements are out in the open let's re-examine "<=".
We already know ({yon(P89)}) that "<=" does more than simply associate the
right hand side with a symbol table entry of the left hand side; it must also
associate an environment with the function body, and this environment is to be
used for accessing non-local variables. This operation of associating environments
is called forming the %2⊗→closure↔←%*. We thus might be tempted to say:
.BEGIN CENTER;SELECT 3;TURN OFF "←";
f <= λ[[ ... ] ...] %1is%* f ← function[λ[[ ...] ...] ].
.END
Alas, this implementation is still not sufficient as we will see in {yonss(P90)}.
.REQUIRE "PROB7.PUB" SOURCE_FILE;
.REQUIRE "NOTES.EVA" SOURCE_FILE;
.SS(Function definitions,,P202:)
Now that we have developed these more explicit evaluators, we should be able to
exploit some of this additional detail. In particular, more of the detail
of "<=" should be apparent. The effect of %3f#<=#λ[[x]%9x%3]%1 is to put
the definition
of %3f%1 in the global environment, whereas %3label%1 creates a new
dest-block with %3f%1 bound to a %3funarg%1 consisting of %9x%1 and
that constructed environment. Once we leave the environment containing
the %3label%1 definition, that definition is effectively destroyed.
The effect of "<=" is to be global; we could implement %3f%1#<=#%9x%1 as
%3set[f;%9x%3]%1 except that the definition must always go in the
global table.
Note that a "<="-definition could be temporarily superseeded
by a %3label%1-definition to the same name and therefore our search
for a binding for %3f%1 may not short-circuit the environment chain.
Our search strategy is encoded in the %3lookup%1 function; using
the current environment, we find the latest binding for a variable.
With the %3prog%1 evaluator, %3peval%41%1 of {yon(P203)}, things
have become a bit more complicated. Besides finding the definition
we must also determine whether the arguments are to be evaluated.
The device of %3isspec%1 ({yon(P201)}) is sufficient for the
evaluators, but has some difficulties if we wish to allow user-defined
special forms. We will develop a syntax for expressing special forms
at the user level, and then discuss problems of implementation.
We will define
"<%4f%*=" to mean "..is defined to be a special form...".
A special-form definition is also called a %2⊗→fexpr↔←%1;
and a call-by-value definition is called an %2⊗→expr↔←%1.
An fexpr is defined with either one or two formal parameters. The first parameter
is always bound to the
list of %2unevaluated%* actual parameters.
If the definition has a second formal parameter, then the environment at the
point of call is assigned to the second parameter.
This distinction needs to be made when we expect to perform some evaluation of the
formal parameters within the fexpr.
Using the implementation of %3eval%1 discussed on {yon(P205)} we can write
either %3eval[%1<form>;<env>]%1 or %3eval%1[<form>]. In the latter case
the environment that is used is the environment which was current when
the %3eval%1 is performed. Sometimes this is not the desired environment.
Consider:
.BEGIN TABIT2(20,31);SELECT 3;TURN OFF "←";GROUP;
\f1 <%4f%*= λ[[x] prog[[y]
\\y ← 2;
\\return[eval[first[x]]]]
.END;
If we execute %3f1[0]%*, %3x%* will be bound to the list %3(0)%* and
%3eval[first[x]]%* will return %30%* as expected. But if we execute:
.BEGIN TABIT2(20,37);SELECT 3;TURN OFF "←";GROUP;
\y ← 0;
\f1[y];
.END
then %3x%* gets bound to %3(Y)%*, and %3eval[Y]%* will find the
value associated with %3Y%* to be %32%*, and
the value of %3f1[y]%* is %32%*,
rather than the expected %30%*.
Clearly, the problem is that the call on
%3eval%* takes place in the wrong environment.
We can correct this by making the definition with %2two%1 arguments, binding
the second to the environment at the point of call to the fexpr:
.BEGIN TABIT2(20,33);SELECT 3;TURN OFF "←";GROUP;
\f2 <%4f%*= λ[[x;a] prog[[y]
\\y ← 2;
\\return[eval[first[x];a]]]
\y ← 0;
\f2[y];
.END;
The call on %3f2%* will use the environment with %3y%* being %30%*
rather than %32%*.
.GROUP;
Special form definitions are useful in several contexts. Recall that we restricted
LISP call-by-value functions to have a fixed number arguments. For example
if we wish to add four numbers or append three lists we have to write something like:
.BEGIN CENTERIT;SELECT 3;
←plus[x%41%3;[plus[x%42%3;plus[x%43%3;x%44%3]%1 or %3append[append[l%41%3;l%42%3];l%43%3]
.END
.APART
.GROUP;
Since %3plus%1 and %3append%1 are associative operations we would rather write:
.BEGIN CENTERIT;SELECT 3;
←plus[x%41%3;x%42%3;x%43%3;x%44%3]%1 or %3append[l%41%3;l%42%3;l%43%3]
.END
.APART;
.BEGIN SELECT 3; GROUP;TABIT2(10,12);turn off "←";
.P59:
.BEGIN FILL;SELECT 1;
Using a special form, we can allow functions of an arbitrary number of arguments:
We could write %3plus%1 as:
.END
plus <%4f%*= λ[\[l;e] prog[[sum]
\\sum ← 0;
\a\[null[l] → return[sum]];
\\sum ← *plus[sum;eval[first[l];e]];
\\l ← rest[l];
\\go[a]]]
.END
Notice that we could have used %3eval%1 with one argument unless the
variables %3l%1 or %3sum%1 appeared as constitutents of the actual parameters.
.GROUP;
Recalling {yonss(P8)}, we can use <%4f%1= to extend the evaluator.
For example, %3and%* could be defined as:
.BEGIN CENTERIT;GROUP;TABIT2(20,35);
←%3and <%4f%*= λ[[l;e]evand[l;e]] %1where %3evand%* is defined on {yon(P53)}.
.END
.APART;
The implementation of %3g#<%4f%1=#%3λ[[x]%9x%1] requires that we represent the
fact that %3g%1 is a fexpr rather than an expr. The implication of
%3isspec%1 of {yon(P201)} is that we have two tables: one for exprs,
one for fexprs. This complicates the search strategy unnecessarily.
Indeed there should only be one definition or value associated with
a name at any one time, so a single table should be both
necessary and sufficient. We %2do%1 need some way of determining
the calling style to be used when applying the definition.
One way is to slightly revise the %3isspec%1 technique: we use %3lookup%1
for %2all%1 searches, but also have a table relating function-name
with its calling style. One difficulty with this scheme is that
we could not handle anonymous fexpr definitions. Therefore
some versions of LISP replace the character "λ" with another special
character when making fexpr definitions. For example:
.BEGIN CENTER;SELECT 3;
g <%4f%3= λ[[x;y]%9x%3] %c≡%3 g <= β[[x;y]%9x%3].
.END
We would translate such β-expressions into S-expr form,
and extend the evaluator to recognize such constructs.
.CENT(Problems)
I. Define %3list%* as a special form.
II. Write a version of %3peval%1 to handle β-expressions.
.GROUP;
III. Define two special forms, %3de%1 and %3df%1, which will
implement <= and <%4f%1= respectively. The format of these special
forms is identical. For example:
.BEGIN CENTER;
%3de%1[<name>;<formal#parameters>;<body>]
.END
will implement
.BEGIN CENTER;
<name> <= λ[<formal parameters> <body>]
.END
.APART
.SS(Rapprochement: In retrospect,,P90:)
As we have just seen there are
alternatives to some of the LISP-techniques and there are some things
which, in retrospect, LISP could have done better.
By way of review we
sketch the basic LISP evaluator of {yon(P6)}:
%3eval%* plus the additional artifacts for %3label and function%*.
There are two arguments to %3eval%*: a %2form%*⊗↓throughout this section we
will say "form", "variable", "λ-expression", etc. rather than "an S-expression
representation of a" ... "form", "variable", "λ-expression", etc. No
confusion should result, but remember that we %2are%* speaking imprecisely.←, that
is, an expression which can be evaluated; and an association list or
%2symbol table%*. If the form is a constant, return that
form. If the form is a variable, find the value of the variable in the current
environment.
If the form is a conditional expression, then evaluate it
according to the semantics of conditional expressions.
The form might also be a functional argument. In this case evaluation consists of
associating the current environment
with the function and returning that construct as value;
in LISP this is done with the %3funarg%* device.
Any other form seen by %3eval%* is assumed to be a function followed by arguments.
The arguments are evaluated from left-to-right and the function is then applied
to these arguments.
The part of the evaluator which handles function application is called %3apply%*.
%3apply%* takes three arguments: a function, a list of evaluated arguments, and
the current symbol table. If the function is one of the five LISP primitives
then the appropriate action is carried out. If the function is a λ-expression
then bind the formal parameters (the λ-variables) to the evaluated arguments and
evaluate the body of the function. The function might also be the result of a
functional argument binding; in this case apply the function to the arguments
in the saved environment rather than in the current environment.
If we are applying the %3label%1 operator,
recalling {yon(P192)}, we build a %3funarg%1-triple
and new environment such that the environment bound in the triple is the
new environment. We then apply the function to the arguments in this new
environment.
.P227:
If the function has a name
we look up that name in the current environment.
Currently we expect that value to be a λ-expression, which is then applied.
However, since function names are just variables, there is no reason that the
value of a function name could not be a simple value, say an atom, and
%2that%* value can be
applied to the arguments, etc. Indeed examination of %3apply%* on {yon(P69)}
will show that %3apply[X;#((A B))#;#((X#.#CAR)#...#)]%* %2will%* be handled correctly.
The obvious extension of this idea is to allow a %2⊗→computed function↔←%*. That
is, if the function passed to %3apply%* is not recognized as one of the preceding
cases, then evaluate the function-part until it is recognized. Thus
we will allow such forms as:
.BEGIN CENTERIT;SELECT 3;
←((CAR#(QUOTE#(CAR#(A#.#B))))#(QUOTE#(A#.#B)))
.END
.BEGIN GROUP;TABIT1(15);
The addition of computed functions modifies %3apply%1 ({yon(P69)}) slightly:
.SELECT 3;
apply <= λ[[fn;args,environ]
\[iscar[fn] → car[arg%41%*[args]];
\ iscons[fn] → cons[arg%41%*[args];arg%42%*[args]];
\ ... ...
\ islambda[fn] → eval[body[fn];pairlis[vars[fn];args;environ]];
\ %Et%3 → apply[eval[fn;environ];args;environ] ]]
.END
What conceptual difficulties are present in LISP evaluation?
We have seen some computational schemes which
will give values when LISP's call-by-value does not terminate.
Whether these schemes are better is a debatable point.
Programmers tend to think "call-by-value", but it is not clear
whether that is habit
and training or a fundamental point of view towards computation.
.BEGIN GROUP;TURN OFF "←";
A more immediate problem is involved with "<=". We were able to give
counterexamples to interpreting:
.BEGIN CENTER;turn off "←";
%3f <= λ[[x]%9x%1] as %3f ← quote[λ[[x]%9x%3]].
.END
The discussion of binding and environments made %3f#←#function[λ[[x]%9x%3]]%1
a more likely candidate; however this interpretation is also not adequate.
.END
.GROUP
.BEGIN CENTERit;
We attempt to implement ←%3fact <= λ[[x][x=0 → 1; %et%* → *[x;fact[x-1]]]]%1 as:
.END
.BEGIN CENTER;SELECT 3;TURN OFF "←";
fact ← function[λ[[x] ... fact[x-1] ...]
.END
.APART
.BEGIN TABIT3(30,34,38);GROUP;turn off "←";
Consider an initial environment with %3fact%1 defined:
\\E%4i%*
\E%4c%*\|\E%4a%*
\___________
\%3fact%*\|%3λ[[x] ...fact[x-1]] : %1E%4i%*
.apart;
.GROUP;
.BEGIN FILL;
We will demonstrate the inadequacy of two natural interpretations of function
values: direct assignment of value, and assignment of %3funarg%1.
We execute %3foo ← fact%* and %3baz ← function[fact]%1, giving:
.END
\\E%4i%*
\E%4c%*\|\E%4a%*
\___________
\%3fact%*\|%3λ[[x] ...fact[x-1]] : %1E%4i%*
\%3foo%*\|%3λ[[x] ...fact[x-1]] : %1E%4i%*
\%3baz%*\|%3fact : %1E%4i%*
.APART
.GROUP
.BEGIN TURN ON "#";TURN OFF "←";fill
Things don't look quite right; the "intent" of both
%3foo#←#fact%* and %3baz#←#function[fact]%1
was to make %3foo%* and %3baz%1 synonymous with %3fact%*. That clearly is not the case
though the right thing happens if we were now to evaluate an expression
involving %3foo%* or %3baz%1. The problem is that it happens for the wrong reason
even though the occurrence
of %3fact%* in the body of %3foo%* will find the right definition of %3fact%*;
an application of %3baz%1 will find the definition of %3fact%1.
One more step will lead to disaster: %3fact#←#quote[λ[[x]x]]%*.
.END
\\E%4i%*
\E%4c%*\|\E%4a%*
\___________
\%3fact%*\|%3λ[[x]x]
\%3foo%*\|%3λ[[x] ...fact[x-1]] : %1E%4i%*
\%3baz%*\|%3fact : %1E%4i%*
.END
Now we have really lost. Though it is perfectly reasonable to redefine %3fact%*
--#it is only a name#-- our intent was to keep %3baz%1 and %3foo%* as realizations
of the factorial function. This intent has not been maintained.
.BEGIN CENTER;
%3fact%* <= %3λ[[x] ...fact[x-1]]%1 is quite different from:
%3foo%* <= %3λ[[x] ...fact[x-1]].
.END
To attempt to understand what is happening we
look at assignments to %2simple%*
variables rather than functional variables. It is clear how the environment
should change during the execution of the sequence:
.BEGIN TABIT1(30);TURN OFF "←";SELECT 3;GROUP;
\x ← 3
\y ← x
\x ← 4
.END
Let's try something slightly more complicated:
.BEGIN TABIT1(30);TURN OFF "←";SELECT 3;GROUP;
\x ← 1
\x ← x%82%* - 6
.END
Here we simply assign %31%* to %3x%*; then while we are evaluating the right
hand side of the next statement, we look up the current value of %3x%*,
perform the appropriate computations, and finally assign the value %3-5%*
to %3x%*. Compare %2this%* to the %3fact%* definition; there we made a point
of not "evaluating" the name %3fact%* in the right hand side of "<=".
.P109:
.BEGIN TURN ON "#";TURN OFF "←";
Notice that %2after%* an assignment like %3y#←#x%* has been executed, then
equality holds between the left-hand and right-hand quantities ⊗↓Indeed, a
logic of programs due to C.A.R. Hoare exploits this fact in interpreting
assignment.←.
Let's look more closely at the relationship between "←" and "=".
Consider %3x#←#x%82%*#-#6%1. After the assignment is made, equality
does %2not%* hold between left- and right-hand sides.
Now consider %3x#=#x%82%*#-#6%1.
Interpreting this expression as an equation, not as an expression
whose value is true or false depending on the current value of %3x%*, we find
two solutions: let %3x%* have value %3-2%* or %33%*.
If we examine our "definition" of %3fact%* in %2this%* light, interpreting
"<=" as being related to "=" rather than "←", then we are faced with an equation
to be solved, only now the form of the solution is a %2function%* which satisfies
the equation. There are frequently many solutions; there may be no solutions.
In our case there is at least one solution: the usual factorial function.
So what we %2should%* write is something more like:
.BEGIN CENTER;SELECT 3;
fact ← %1a solution to the equation%*: f = λ[[x][x=0 → 1; %et%* → *[x;f[x-1]]]].
.END
.END
That is,
the %2real%* intent of the recursive definition of %3fact%* was to define a
function to effect the computation of factorial and then to %2name%* that
function %3fact%*.
Questions of when solutions exist, how many, and which are the "best"
solutions is a topic of much current research ⊗↑[Man#74]↑.
.BEGIN TURN OFF "←";
LISP has two ways of assigning values to functions: %3label%1 and <=.
The use of %3label%1 manufactures a new "knotted" environment but
does not always find the mimimal solution ⊗↑[Gor#73]↑.
The evaluation of %3f#<=#%9f%1 is interpreted as %3f#←#quote[%9f%3]%1, where
the assignment is make in the global environment.
LISP's solutions are sufficient for most definitions, but
this discussion should make clear the necessity of distinguishing between
assignment and equality; unfortunately, many programming languages use notations
which tend to obscure these differences.
.END
.CENT(Problems);
I The %3foo-fact-baz%1 example of this section
is not described in LISP. Can you write a LISP
program without %3prog%1s which will also exemplify this difficulty?
.ss(The LISP machine,,P175:)
The LISP definitions and expression which we have been writing are expressed
in a language called the %2⊗→meta-language↔←%1, and
the LISP expressions called ⊗→M-expressions↔← or M-exprs.
The most primitive data structures of LISP are called S-expressions.
We have seen that it is possible to represent M-expressions as S-expressions,
and
indeed, that significant results are obtained from such a mapping.
The %2programming language%*, LISP, uses the S-expr translation
of the LISP algorithms.
As we move from the more formal aspects of LISP to the practical
details of implementations and applications, we should examine
some of the features of LISP as a programming language.
Since proper input to LISP programs are S-exprs and since we are writing
LISP programs in S-expr form, then on input, data and program are
indistinguishable in format; i.e., they are both binary trees.
For evaluation, programs must have a very special structure,
but program and data are both S-exprs just as in most hardware machines
the contents of locations which contain data are indistinguishable
in format from locations which contain instructions.
On a typical contemporary machine it is the way in which a location is accessed
which determines how the contents of that location are interpreted.
If the central processor accesses the location via the program
counter, the contents are assumed to be an instruction. That same
location can usually be accessed as part of a data manipulating
instruction. In that case, the contents are simply taken to be data.
LISP is one of the few high-level languages which also has this property.
It gives the programmer exceptional power.
The central processor of a "LISP machine" is ⊗→%3eval%*↔←⊗↓Several "LISP machines"
have been proposed or actually built including: ⊗↑[Got#xx]↑,
⊗↑[Gre#74]↑, ⊗↑[Deu#73]↑, and ⊗↑[Bar#71]↑←.
If %3eval%* references an S-expr via its "program counter", then
that S-expr is decoded via the internals of %3eval%*. If an S-expr is
referenced as an argument, then it is
taken as data⊗↓This goes for %3funarg%1s as well: until they're applied,
they're data.←. The identity of program and data is not fixed even within
the execution; a LISP program can %3cons%1#up a data structure which can then
be interpreted as program either by explicitly calling %3eval%1 or implicitly
by appearing in the function-position of an application.
.GROUP;
The simplest way to communicate with such a machine is to read an
S-expr translation of a LISP expression into memory, evaluate the expression, and
print the result. Several implementations of LISP use a variant
of this "%3read-eval-print%1" loop:
.BEGIN SELECT 3;TABIT3(30,34,36);GROUP;
\prog[[]
\\a\print[eval[read[];( )]];
\\\go[a]].
.END
.APART
Note the similarity to %3loop%1 on {yon(P208)}.
The details of the input and output operations in LISP are discussed in the next
chapter.
.P132:
List-notation
is a recognizable input and the output from LISP programs will be converted
to list-notation wherever possible.
But that's where
things stand. LISP will allow you to manipulate the %2representation%*
of the lists. The LISP S-expr operations like %3car, cdr%*, and
%3cons%* operate without complaint on lists, even though we have
repeatedly said that these functions are S-expression functions.
LISP's attitude to %3car%1 and %3cdr%1 is similar to its
treatment of %3go%1s and labels: these are useful primitives out of which
to build "real" tools; either real data structures or real control structures.
However
the effect of this behavior is to present the potential
LISP user with an incredible potentiality for generating programming errors.
It also removes from the user a powerful debugging tool. If we
write programs such that the type of each data object must be given⊗↓For example,
in {yonss(P41)} the types of the arguments to %3diff%* should be <poly> and
<variable>, %2not%* list and atom.←,
and if we write each function such that the process of binding arguments
to values must check that the type of the actual parameter agrees with
the type of the parameter of the function, then a very large number
of programming errors can be located almost as soon as they occur.
You can think of the parameter-passing mechanism as sort of a "fire-wall",
which will attempt to contain the deviant behavior within the particular
function.
Any function which gets called has a right to expect
that it will be called with reasonable values. Part of being
reasonable is having the correct number of arguments given to it;
%3cons[A;#B;#C]%* is as bad as %3cons[A]%*. Part of being reasonable
is having the right kind of arguments; we don't expect reasonable results
from %3sub1[A]%*.
We should not expect that the functions are sufficiently omniscient
to convert an argument of the wrong kind into a proper one.
If a function is written to expect an argument of type %3polynomial%*
then it should complain if it receives an argument of type
%3list%* even though the current representation for polynomials
might be special instances of lists.
Many current languages do indeed offer such omniscience. Fortran
calls this service "conversion"; Algol68 calls it %2"⊗→coercion↔←"%*.
However if
each function accepts whatever argument
it is given and attempts to use it in its computation, then the first
indication of trouble will occur when a primitive function is called and
causes some error deep within the implementation.
Typically this indication of error is way past
the actual source of the difficulty.
The alternative is to
explicitly code tests into the entry code of each function definition;
but
that's an expensive use of the programmer's time and the computation time.
What typically happens is that the tests are left out and debugging
soon follows.
As with most areas of programming, coercion is not a black-or-white issue.
A strong type structure can hinder as well as help. Requiring explicit
declarations and directions for conversion is frequently annoying. Indeed
several important programs should be type-free. In particular the
debugging programs must be able to freely access all parts of the representation
of program and data without regard for type.
LISP's position is that it is the user's responsibility to handle
all type restrictions by programmed tests.
LISP has no capability to
define and maintain abstract data structures. However
people have begun investigations of "typed LISP" ⊗↑[Car#76]↑, and
some implementations of LISP ⊗↑[Int#75]↑ give some aids
in constructing and maintaining typed data structures.
Symbolic expressions are the only real data structure; we almost
have sequences as a data structure, and the necessary ingredients
are there to build abstract data structures. But the question of
integrity in using such defined data structures is left totally in the
hands of the programmer. In summary, LISP is an excellent tool for building
more complex systems. This is one of the reasons that LISP has maintained
its position as the "machine language" for artificial intelligence.
.SEC(Implementation of the Static structure of LISP,static structure,,P188:)
.TURN ON "←","α";
%1
.SS(Introduction)
The material in the previous chapters has been rather abstract.
This chapter begins a discussion of the actual mechanisms which
underlie a typical implementation of LISP. However the importance of the
techniques we will describe extends far beyond the implementation of
this particular language. Most of the ideas involved in our implementation
are now considered standard system programming techniques and
are common tools in language design. LISP is particularly well-suited
to the task of explicating these ideas since many find their origins in
the first LISP implementation.
We will begin our discussion of implementation with an analysis of
storage regimes for S-expressions. As with the more abstract discussions
of representations,
the "concrete" representation which we pick for our data structures (S-expressions)
will have direct bearing on the implementation of the primitive constructor
(%3cons%*), selectors (%3car, cdr%*) and predicates (%3atom, eq%*) of LISP.
Since we are now attempting to become more practical,
we must worry about the efficiency of the implementation
and we must
worry about input/output mechanisms so that the language may communicate with the
external world.
The present chapter will develop a picture
of the %2static%* structure of an implementation, or to be more
graphic, this chapter describes the memory of a LISP machine.
The next chapter discusses the %2dynamic structure%* of LISP;
that is, the control structures necessary to
evaluate expressions involving recursive functions and other LISP
constructs.
Throughout these discussions we will remain as abstract as
possible without losing too much detail.
We will
describe the "logical" structure of a LISP machine, even though
a more efficient implementation might map differing logical structures onto
the same physical structures by utilizing machine-dependent techniques.
.SS(Representation of S-expressions)
We previously have expressed the dotted pair %3(A#. (B#.#C))%* as:
.BEGIN SELECT g;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#";
.TURN ON "∞" FOR "%";
/\
/ \
/ /\
/ / \
∞3A∞* ∞3B∞* ∞3C∞*
.END
or occasionally (on {yon(P102)}) as:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α";
.TURN ON "∞" FOR "%";
⊂αααπααα⊃ ⊂αααπααα⊃
ααααα→~ # ~ #αβααααααααααα→~ # ~ # ~
%αβα∀ααα$ %αβα∀αβα$
↓ ↓ ↓
∞3A∞* ∞3B∞* ∞3C∞*
.END
This second style of graphical representation has a direct representation
in the storage layout of our machine.
Each "double-box" will be represented by a machine location and each
arrow will be represented as a %2⊗→pointer↔←%* to another machine
location.
Notice that each box contains two pointers; therefore each corresponding
machine location, %bloc%*,
will be interpreted as containing two machine addresses⊗↓An actual hardware
machine may not be of sufficient word-size to accomodate two
addresses; in this case, several real words may be needed to represent
one LISP word. For example, the PDP-11 (16#bit#words) implementations typically
use two machine words (⊗↑[Har#75]↑), and micro-processor versions (8#bit#words)
may use four words (⊗↑[Pag#76]↑). In {yonss(P224)} we discuss a special
compact representation of LISP cells.←. The left-hand address
will represent the %3car%*-branch; the right-hand address
will represent the %3cdr%*-branch:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α";
.TURN ON "∞" FOR "%";
⊂αααααπααααα⊃
loc ~ car ~ cdr ~
%ααααα∀ααααα$
.END
The pointers will either reference atoms or
point to other two-pointer boxes.
Literal atoms
--#like %3A,#B,#C%*#-- will also be represented in machine locations, only here the
contents of each location will be an encoding of the name of the atom.
Obviously the
contents of such a location must not be interpreted as pointers.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α";
.TURN ON "∞" FOR "%";
⊂αααααααααααααααααααααα⊃
loc ~ rep. of literal atom ~
%αααααααααααααααααααααα$
.END
To help keep track of the different uses of machine locations we
will partition our machine's memory into two disjoint spaces: %2⊗→pointer space↔←%*,
which will contain two-pointer cells; and %2⊗→atom space↔←%*, which will
contain information like atoms which should not be interpreted as
pointers.
Thus if the first box in our example were represented by location %b100%*
and the second were represented by location %b405%*, and the atoms
%3A%*, %3B%*, and %3C%* were represented by the numbers %b40%*, %b41%*, and
%b42%*, and were to be found
in locations %b710%*, %b762%*,
and %b711%*, respectively, then the following picture beginning at location
%b100%* could represent the dotted pair %3(A#.#(B#.#C))%*.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α";
.TURN ON "∞" FOR "%";
⊂αααααπααααα⊃
100 ~ 710 ~ 405 ~
%ααααα∀ααααα$
...
⊂αααααπααααα⊃
405 ~ 762 ~ 711 ~
%ααααα∀ααααα$
...
⊂ααααααααααα⊃
710 ~ 40 ~
εαααααααααααλ
711 ~ 42 ~
%ααααααααααα$
...
⊂ααααααααααα⊃
762 ~ 41 ~
%ααααααααααα$
.END
Thus the left half of location %b100%* points to the representation of the
atom %3A%* and the right half points to the representation of the dotted pair
%3(B#.#C)%*.
Notice too that given the entry point into the representation --location %b100%*
in the example-- we can unambiguously discover the S-expr being represented.
This representation of Symbolic expressions is a special case of a
scheme called %2⊗→linked list structure↔←%*. The term "linked"
refers to the fact that to find succeeding elements in the representation
we must follow the explicit pointers as opposed to, say, merely
incrementing an array pointer.
The phrase "list structure" describes an arbitrary interconnection of these
two-pointer nodes, including self-referential structures. We will discuss such
general structures later; for the moment we restrict such constructions
to LISP trees: no intersecting branches.
The particular brand of linked list structure which we have demonstrated is called
%2⊗→singly linked↔←%*.
The adjective "singly" means that only %2one%* pointer
is stored as the representation of
the arrow, %b→%*.
In the case of LISP, the terminology means that the representation of each pointer
only tells us how to find succeeding elements in the structure. For
example, if we were looking at location %b405%* the representation
tells us how to find the %3car%* or %3cdr%*; they're at %b762%* and
%b711%* respectively. But if we wanted to find the %2predecessor%*
of %b405%* in this representation it would require some further calculation.
We would have to start at the beginning of the S-expr representation
and look for a location such that its %3car%* or %3cdr%* is the desired cell.
If our use of S-exprs required frequent discovery of such predecessors
then we might consider an even more complex linked representation which
would also contain information about the predecessor of each node,
essentially representing %b→%* as %b↔%*.
.GROUP;
.P164:
For example:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "←"
.TURN OFF "%","#","α";
.TURN ON "∞" FOR "%";
⊂αααπααα⊃ ⊂αααπααα⊃
α←→α→~ # ~ #←βααα←→αααααα→~ # ~ # ~
%αβα∀ααα$ %αβα∀αβα$
↑ ↑ ↑
↓ ↓ ↓
∞3A∞* ∞3B∞* ∞3C∞*
.END
.APART
One such representation is called %2⊗→doubly-linked list structure↔←%*.
In this representation
of LISP trees we could store %2three%* pieces of information with each
non-terminal node:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α";
.TURN ON "∞" FOR "%";
⊂ααααααααααααα⊃
~ predecessor ~
loc εααααααπααααααλ
~ car ~ cdr ~
%αααααα∀αααααα$
.END
.BEGIN TURN OFF "←";
Note that LISP trees always %2do%1 have
unique predecessors. In the case of list-structure, unique predecessors
do not always exist. Compromises exist in some applications: some data structures
can be doubly-linked, allowing fast traversal but requiring more space;
while other data structures are singly linked, requiring less space, but
requiring more time to traverse (⊗↑[Gua#xx]↑); still other structures may have more
compact representation as arrays or numbers.
For example, a typical representation of a vector,
or sequence of fixed length, is to store the elements sequentially
in memory⊗↓We will discuss these more detailed representations in {yonsec(P210)}.←.
Since each element in this structure has at most one successor
we can use the sequential addresses as implicit pointers to retrieve
successive elements. A general S-expr has two successors; thus
the implied linear addressing scheme of most machine memories is insufficient
as it stands; LISP uses linked allocation. Again there are compromises.
For example the following memory representation is valid for LISP trees:
for any location %3n%1, find its successors at locations %32n%1 and %32n+1%1;
note that the predecessor of any cell is unique.
Each location must contain an indication of whether or not it is an atom.
The remaining contents of a location is available for data⊗↑[Ber#71]↑.
.END
.P140:
We will frequently wish to refer to several different S-exprs simultaneously;
for example, when we are talking about the implementation of the
function %3cons%* we will be manipulating the representations of two
S-exprs.
Similarly we will want to refer to several pieces of a single complex
S-expr; for example we might wish to "put a finger" at a specific point
in a structure and then, depending on the result of a computation
on some sub-part, move the "finger" either left or right.
To facilitate such discussions we will assume the existence
of a set of pointer registers: %bPt%41%1, %bPt%42%1,#...,#%bPt%4n%1.
Thus, using the above example, the following represents %bPt%41%1 pointing
at %3(B#.#C)%* and %bPt%42%1 pointing at the atom %3A%*:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α";
.TURN ON "∞" FOR "%";
Pt∞41∞b Pt∞42∞b
⊂αααααααα⊃ ⊂αααααααα⊃
~ 405 ~ ~ 710 ~
%αααααααα$ %αααααααα$
.END
.P159:
Implicit in our representation is the assurance that we can differentiate between
locations in atom space and locations in pointer space. For example, assume each of
our locations can hold six digits and assume we will
store a numeric atom as its corresponding number. Then the atom
%3762711%* would be stored as:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α";
.TURN ON "∞" FOR "%";
⊂αααααααα⊃
~ 762711 ~
%αααααααα$
.END
Since this is exactly the contents of location %b405%*⊗↓The vertical
bar doesn't appear in the machine's memory.← some confusion
is possible: is the contents a number or is it two pointers?
A typical trick is to partition memory such that particular portions of the
address space correspond to each of the logical spaces:
atom space or pointer space. In our example we could assume that addresses
below %b700%* are locations for pointers, while addresses %b700%* and
above contain atoms. Thus the representation of %3762711%* would
appear in a location with address %b700%* or greater.
Though our memory system is not completed yet, we %2do%* have enough
structure to begin a discussion of the implementation of some of the
primitive LISP operations.
.CENT(Problems)
What problems do you foresee in using the double-linking scheme for
representing LISP's S-exprs?
.SS(Representation of LISP primitives,,P26:)
.SELECT 1;
Now that we have some of the representational problems for Symbolic expressions
reasonably well in hand we will look at the implementation of the
LISP primitives. We will examine %3car, cdr, eq,%* and %3atom%*
in this section, leaving %3cons%* for later.
We need to understand how these primitives are to obtain their parameters and
how they are to return their values.
Recall our discussion of environments and destinations in {yonss(P209)}.
An environment chain was constructed
by linking destination blocks whose value slots have been filled.
A dest-block was
created when we recognized a function application⊗↓If
the application supplied an incorrect number of arguments no dest-block is built;
the debugging routine is called. Several existing implementations supply
missing arguments with %3NIL%1 or evaluate and discard extra arguments.
This treatment of improper calling sequences ignores one of the most
common sources of bugs in LISP programs (⊗↑[Mot#76]↑).←.
The name components of a block are either the λ-variables in the case of
a λ-application or are system-generated names in the case of primitive application;
the value-slots of a dest-block
received the evaluated actual parameters.
When a dest-block
was filled, it was chained onto the front of the environment and we were ready
to call the function. Thus
the first block on the environment chain was the local symbol table.
The function was expected to return its
value to a designated dest-block, and then return to the interrupted computation.
So, on entrance to a primitive we have access to at least two structures:
a destination block and the environment chain.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
∞3dest∞* ∞3env∞*
~ ~
~ ⊂ααααααα⊃ ~ ⊂ααααααα⊃
%α→ ~ #αβα⊃ %αα→ ~ #αβαα ### →
εαααπαααλ ↓ εαααπαααλ
# # # # # #
~ ~ ~←$ ~ ~ ~
εαααβαααλ εαααβαααλ
~ ~ ~ ~ ~ ~
# # # %ααα∀ααα$
εαααβαααλ
~ ~ ~
%ααα∀ααα$
.END
.BEGIN GROUP;
Here is how %3car%* uses these structures:
.BEGIN INDENT 10,10;
Let %3val%1 be the value-part of the first entry in the local table.
If %3val%1 is an atom then %3car%* is undefined; the implementation
should send a message to the debugging package (see {yonss(P168)}).
If %3val%1 is %2not%* atomic, it has a left- and right-hand side.
We should send the %2left%*-hand side of %3val%1 to the value part of the
slot pointed to in the dest-block.
.END
.END
.GROUP;
For example:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
∞3dest∞* ∞3env∞*
~ ~
~ ⊂ααααααα⊃ ~ ⊂ααααααα⊃
%→ ~ #αβα⊃ %→ ~ #αβαα ### →
εαααπαααλ ↓ εαααπαααλ
# # # ~ ~102~
~ ~ ~←$ %ααα∀ααα$
εαααβαααλ
~ ~ ~
# # #
εαααβαααλ ⊂αααπααα⊃
~ ~ ~ 102 ~204~ 22~
%ααα∀ααα$ %ααα∀ααα$
.BEGIN CENTER;SELECT 2;
Before
.END
⊂ααααααα⊃ ⊂ααααααα⊃
~ #αβα⊃ ~ #αβαα ### →
εαααπαααλ ↓ εαααπαααλ
# # # ~ ~102~
~ ~204~←$ %ααα∀ααα$
εαααβαααλ
~ ~ ~
# # #
εαααβαααλ ⊂αααπααα⊃
~ ~ ~ 102 ~204~ 22~
%ααα∀ααα$ %ααα∀ααα$
.END
.BEGIN CENTER;SELECT 2;
After
.END
.BEGIN CENTER;SELECT 2;
Example for %3car%*
.END
.APART
For successful completion %b%3car%1 expects its actual parameter
to be pointing
to a node in pointer space; otherwise we get an error. If the operation
is successful then the dest-slot is changed to point to whatever was pointed to
by the left-hand side of the selected cell.
The description of %3cdr%* is sufficiently similar that we leave it to the
reader.
.GROUP;
On {yon(P159)} we described the internal structure of LISP atoms. Using that
representation we can give a simple implementation for the predicate %3atom%*:
.BEGIN INDENT 10,10;
Does the actual parameter point into that area reserved for atom names?
If so, send
a representation of truth as value, otherwise send a representation of
false.
.END
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
∞3dest∞* ∞3env∞*
~ ~
~ ⊂ααααααα⊃ ~ ⊂ααααααα⊃
%→ ~ #αβα⊃ %→ ~ #αβαα ### →
εαααπαααλ ↓ εαααπαααλ
# # # ~ ~710~
~ ~ ~←$ %ααα∀ααα$
εαααβαααλ
~ ~ ~ ⊂ααααααα⊃
# # # 710 ~ "A"~
εαααβαααλ %ααααααα$
~ ~ ~ ⊂ααααααα⊃
%ααα∀ααα$ 714 ~ "T"~
%ααααααα$
∞1We are writing ∞b"A"∞1 instead of the numeric encoding. Thus ∞b"A"∞1 is really ∞b40∞1.∞b
.BEGIN CENTER;SELECT 2;
Before
.END
⊂ααααααα⊃ ⊂ααααααα⊃
~ #αβα⊃ ~ #αβαα ### →
εαααπαααλ ↓ εαααπαααλ
# # # ~ ~710~
~ ~714~←$ %ααα∀ααα$
εαααβαααλ
~ ~ ~
# # #
εαααβαααλ
~ ~ ~
%ααα∀ααα$
.END
.BEGIN CENTER;SELECT 2;
After
.END
.BEGIN CENTER;SELECT 2;
Example for %3atom%*
.END
.APART
Notice that we did not need to examine the contents of location %b710%*;
it was sufficient to know that the location was between predetermined bounds.
If the actual parameter was not pointing at an atom we would have returned a pointer to
a location containing %b"NIL"%1.
.GROUP
Finally we describe an implementation of %3eq%*.
.BEGIN INDENT 10,10;
Do both acutal parameters
both point into atom space?
If not the result is undefined. If they %2do%* then do they reference the same atom?
We can determine this latter condition in two ways: first, they
might point to two different locations in atom space; we would have to examine the
contents of those locations; if they agreed then %3atom%* should return a
representation of truth. A more satisfactory solution is to store
each atom %2uniquely%*; one location will be reserved for %b"A"%1, etc. Now
all %3atom%* need do is make sure that both slots point
into atom space %2and%* point to the same location. Thus no additional
memory reference is required. From now on we will assume that all atoms are
indeed stored uniquely.
.END
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
∞3dest∞* ∞3env∞*
~ ~
~ ⊂ααααααα⊃ ~ ⊂ααααααα⊃
%→ ~ #αβα⊃ %→ ~ #αβαα ### →
εαααπαααλ ↓ εαααπαααλ
# # # ~ ~710~
~ ~ ~←$ εαααβαααλ
εαααβαααλ ~ ~710~
~ ~ ~ %ααα∀ααα$
# # # ⊂ααααααα⊃
εαααβαααλ 714 ~ "T"~
~ ~ ~ %ααααααα$
%ααα∀ααα$ ⊂ααααααα⊃
710 ~ "A"~
%ααααααα$
.BEGIN CENTER;SELECT 2;
Before
.END
⊂ααααααα⊃ ⊂ααααααα⊃
~ #αβα⊃ ~ #αβαα ### →
εαααπαααλ ↓ εαααπαααλ
# # # ~ ~710~
~ ~714~←$ εαααβαααλ
εαααβαααλ ~ ~710~
~ ~ ~ %ααα∀ααα$
# # # ⊂ααααααα⊃
εαααβαααλ 714 ~ "T"~
~ ~ ~ %ααααααα$
%ααα∀ααα$ ⊂ααααααα⊃
710 ~ "A"~
%ααααααα$
.END
.BEGIN CENTER;SELECT 2;
After
.END
.BEGIN CENTER;SELECT 2;
Example for %3eq%*
.END
.APART
We still have a representational problem to resolve: if we wish to represent
the number
%340%1 as %b40%1 then we have a conflict with our proposed representation
of the atom %3A%1 as %b40%1. {Yonss(P128)} resolves this conflict.
.CENT(AMBIT/G)
.P158:
Before looking more deeply into the detailed nature of atoms,
we should explore a possibility for describing LISP's primitives.
We have described them by example, without much loss in clarity.
It would be more
pleasing if we could describe each primitive in more general terms.
Fortunately we can give less machine-dependent characterizations
using a micro-version of a graphical language called ⊗→AMBIT/G↔←⊗↓An
ambit is half aminal, half hobbit. AMBIT/G also is an acronym
for Algebraic Manipulation By Identity Transformation/Graphical.←.
When faced with explaining a complex structure-manipulating
program to someone, we draw pictures. We do it in LISP to describe the data
structures and in this section we have given graphical descriptions
of the LISP primitives using examples. AMBIT is an extension of
both of these ideas; it
is a graphical language for the description of both
data and algorithms.
The basic statements of the language are %2pattern-match-and-replacement%*
rules.
Patterns are described as combinations of shapes and solid arrows.
If an instance of a pattern can be
found in the current state of the computation, then we will replace
that instance with a new pattern.
The only kind of replacement we will allow is the %2swinging%*
of an arrow so that its head moves from one node to another.
Thus the new pattern differs from the old only in the positioning
of some of the arrows.
Where the arrow head strikes a node is immaterial; the origin of the tail
%2is%* important.
Dashed arrows show replacements to be made if the pattern matches.
Portions
of the shapes marked with "?" are "don't-care" conditions.
.GROUP;
For example, here's %3car%*:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
∞3dest∞* ∞3env∞*
~ ~
~ ⊂ααααααα⊃ ~ ⊂ααααααα⊃
%→ ~ #αβα⊃ %→ ~ #αβαα ### →
εαααπαααλ ↓ εαααπαααλ
# # # ~ ~ # ~
~ ~ ~←$ %ααα∀αβα$
~ ~ # → - - - - → ~
εαααβαααλ ↓ ↓
~ ~ ~ ⊂αααπααα⊃
# # # | ~ # ~ ? ~
εαααβαααλ %αβα∀ααα$
~ ~ ~ ↓ ↓
%ααα∀ααα$ ⊂ααααααα⊃
%→ - - → - - →~ ? ~
%ααααααα$
.END
.BEGIN CENTER;SELECT 2;
algorithm for %3car%*
.END
.APART
This AMBIT diagram contains equivalent information to the previous
%2example%* of %3car%*, but the extraneous details of which
addresses are involved have been stripped away.
We will use such diagrams occasionally when they will contribute to clarity.
.CENT(Problem)
Give an AMBIT diagram for %3eq%1.
.SS(A few programming techniques,,P214:)
There are a few useful and practical LISP programming techniques
which take advantage of some of the tricks of implementation.
These tricks are supported in most implementations and are useful
enough that they should be documented as programming language features.
.BEGIN INDENT 0,5;GROUP;
%21.%1 In most implementations of %3eq%1
the check for atom-ness is suppressed and a simple address comparison
is made. Non-atomic S-expressions are not usually stored uniquely⊗↓See
the problem on hash consing on {yon(P154)}.←; Thus
in most implementations
←%3eq[(A . B);(A . B)]%* is usually false, but
←%3eq[x;x] %*is usually true for any %3x%*.
.BEGIN INDENT 5,5;
Please note that we are %2not%* changing the definition of %3eq%*.
%3eq%* is still undefined for non-atomic arguments.
The preceding remarks deal only with the usual implementation of
%3eq%*.
.END
.APART
.GROUP;
%22.%1 The implementation of the truth values %et%1 and %ef%1 is
usually simplified, mapping %ef%1 onto %3NIL%1, but allowing
anything %2but%1 %3NIL%1 as a representation of %et%1.
This allows several related tricks:
←any expression may be used as a predicate, and
←used as a predicate, %3not[null[l]]%1 has the same effect as %3l%1.
.APART;
.GROUP;
%23.%1 Several implementations of conditional expressions allow
"p%4i%1" as an abbreviation for "p%4i%1#→#p%4i%1". The computational effect is
the same, but p%4i%1 is only evaluated once.
.END
.GROUP;
For example, an extended version of the predicate %3member%1 can be written.
This "predicate" will return %ef%1 if no matching element is found, otherwise
it will return the list beginning at the match.
.BEGIN SELECT 3;TABIT1(14);
mem%9'%3 <= λ[[x;l]\[null[l] → %ef%3;
\ equal[x;first[l]] → l;
\ %et%3 → mem%9'%3[x;rest[l]]] ]
.END
.APART;
Such a feature is useful in contexts where we wish to test for the existence
of an object and, if a match is found, do something with the object.
The use of %3assoc%1 occurs in similar contexts.
.CENT(Problem)
I. The application of these tricks tends to lead to
somewhat unesthetic programs. Typically we have to test for existence then,
assuming an instance was discovered, we have to perform further computation
on that instance⊗↓If no further computation is necessary, trick No.#%23%1
suffices.←. Constructs like:
.BEGIN CENTER;SELECT 3;
[it ← test[object] → smash[it]; ...]
.END
arise. The objectional aspect involves the variable %3it%1. The variable %3it%1 is not local
to the conditional expression. Either %3it%1 is global: an unnecessarily
gratuitious side-effect; or %3it%1 is bound by λ-expression or a %3prog%1.
In either case the binding is too far removed from its useage. Sussman and Steele
⊗↑[Sus#76]↑ suggest the construct:
.BEGIN CENTER;
%3test[<form%41%1>; %3λ[[x]%1<form%42%1>]; <form%43%1>],
.END
with the following semantics: evaluate <form%41%1>; if it gives
a value other than %ef%1 then
apply the second argument, a unary function, to that value. Otherwise
evaluate <form%43%1>.
Recast the %3isspec%1-argument of {yon(P219)} in terms of %3test%1.
Give an implementation of %3test%1 by extending one of our evaluators.
.SS(Symbol tables and property-lists,,P128:)
Since we are looking at implementation details,
and since manipulation of symbol tables is such a central issue
in evaluation, we should look more closely at symbol tables
and their organization.
We have seen two fundamentally different implementations of symbol
table orgainzation: deep binding and shallow binding. We should examine the various
advantages of these implementations, and probe deeper into the details of
their implementation.
If the number of entries associated with an atom is small then shallow binding
may be advantageous.
If the number of entries associated with an atom is very large then
the shallow binding technique may be too costly and deep binding or yet
another organization may be required (⊗↑[McD#75]↑).
.P139:
Recall our discussion of %3getval%1 and %3addval%1 in {yonss(P134)}.
These two functions were developed to describe shallow binding, but
they are illustrative of a more general idea. In symbol manipulation
and symbolic programming, we often want to be able to associate a set
of data with an item under discussion. For example, an algebraic simplifier
would like to know whether a specific operator is associative; if so,
certain simplifications are valid. We could maintain a list of all associative
operations and require that the simplifier check that list. But since
associativity is a property of the operator it seems more natural to
associate the property with the operation. Search considerations also
arise if the list of operations is long. Indeed the same considerations
we expressed in the discussion of shallow binding arise here;
and the same solution is applicable.
We generalize the idea expressed in %3getval%1 and %3addval%1,
allowing the association of an arbitrary collection of properties and
values.
Most implementations of LISP allow the association of
arbitrary sequences of %6property-value%* pairs with an atom.
With each atom we will
associate a list called the %6⊗→property-list↔←%* or p%6-list.%*
The names %2attribute%1, or %2indicator%1, are sometimes used as synonyms
for property. An atom is frequently called a %2carrier%1 of the properties.
A property list is a table of properties and property#values. The size of the
property list is not fixed, but can shrink and grow during a computation.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααααααπααααα⊃
~ prop1 ~ val1~
εαααααααβαααααλ
~ prop2 ~ val2~
# # #
εαααααααβαααααλ
~ propn ~ valn~
%ααααααα∀ααααα$
.END
We have seen an identical diagram in {yonss(P220)}; property lists
turn out to be a very effective means of modelling data bases⊗↓They
are by no means the %2only%1 or %2best%1 way of representing
every data base. See ⊗↑[McD#75]↑.←.
As abstractions, property lists are symbol tables. The name-entries
are properties and the value entries are the property values.
We identify an atom with its property list.
.GROUP;
For example,
if we wished to represent the "+"-operation as the atom %3PLUS%1
and wished to signify that the operation is associative and binary,
the atom %3PLUS%1 might be represented as:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
PLUS ⊂αααααααπααα⊃
~ assoc ~ T ~
εαααααααβαααλ
~ arity ~ 2 ~
%ααααααα∀ααα$
.END
.APART
In these kinds of applications, we are using
properties associated with atom, thought of as a data structures.
These same techniques are
applicable when we think about atoms as representations of variables as
used in the evaluation process.
.GROUP;
In fact, an atom can simultaneously be used as a carrier of a value and
can also have a property list:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
E4
⊂αααπααα⊃
⊂←β X ~ 3 ~
↓ εαααβαααλ
# # # E3
↓ %ααα∀ααα∀→αα→⊂αααπααα⊃
~ # # #
↓ εαααβαααλ
%→ααα→π←αα←ααα←β X ~ A ~ E1
↓ ↑ %ααα∀ααα∀→ααα→⊂αααπααα⊃
⊂αααααααπααααα⊃~ # # #
~ prop1 ~ val1~~ εαααβαααλ
# # # %←ααααα←ααααα←ααβ X ~ B ~
εαααααααβαααααλ εαααβαααλ
~ propn ~ valn~ # # #
%ααααααα∀ααααα$ %ααα∀ααα$
∞1p-list for ∞3X∞1
.END
.APART;
.GROUP;
Thus all instances of %3X%1 share a common property list, but
the value, or binding of %3x%1 is found using the environment chain.
This example is described in terms of deep binding, but the
property-list idea is also directly applicable to the shallow binding scheme.
To adapt p-lists to this binding scheme, we introduce a property named %3VAR%1
whose property value will be the collection of environment-value pairs:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
X ⊂αααααααπαααααα⊃ ⊂ααααπααα⊃
~ VAR ~ #αβα→αα→~ E4 ~ 3 ~
εαααααααβααααααλ εααααβαααλ
~ prop1 ~ val1 ~ ~ E3 ~ A ~
εαααααααβααααααλ εααααβαααλ
# # # ~ E1 ~ B ~
εαααααααβααααααλ %αααα∀ααα$
~ propn ~ valn ~
%ααααααα∀αααααα$
.END
.APART
In summary, property lists are a feature independent of the binding strategy
we have implemented. A property list contains all the aspects of an atom
which we wish to consider. The deep binding implementation emphaiszes that
the value of a variable is ssociated with the environment in which that
value is created. However shallow binding organization associates values
with variables, and thus it is natural to think of property-lists when
thinking of shallow binding.
We wish to look more closely at the value aspects of atoms.
We have seen three properties related
to the value of a variable: simple value, call-by-value function (expr) and
call-unevaluated function (fexpr). We were able to distinguish between exprs and
fexprs by one of two methods: either place the fexpr name in a special list
or store the fexpr as a β-expression, rather than as a λ-expression.
We made no explicit distinction between simple values and function values;
if a simple value appeared in the function position of an application, we
evaluated that expression until we %2did%1 discover a function object.
If a function appeared in a position expecting a simple value, then the
data structure interpretation of the function object was taken.
Since "<=" and "<%4f%1=" were defined to place the appropriate function definition
in the global table, we can interpret these operations as associating the
definition with the atom. That is, being an expr or fexpr is a property of the
atom. Similarly, globally bound variables like %3t%1 and %3nil%1 play the roles of
constants and therefore can be interpreted as having a value associated with
them. Primitive functions like %3car%1, and primitve special forms like
%3cond%1 should also be considered as being constants. Their "values"
are fixed procedures for specific call-by-value and call-unevaluated operations,
respectively.
.GROUP;
This discussion exemplifies five value-like properties which are properties
of an atom, rather than properties of a particular environment⊗↓Soon we will
discuss the possibilities of consolidating the %3VAR%1 property with these
other value properties. That will be the intent of
{yonss(P225)}.←:
.BEGIN TABIT2(15,45);GROUP;
\%3CONST%1\simple value; used as a constant
\%3EXPR%1\call-by value definition
\%3FEXPR%1\call-unevaluated definition
\%3SUBR%1\call-by-value primitive
\%3FSUBR%1\call-unevaluated primitive
.END
.APART
In {yonss(P57)}
we will introduce another protocol for assigning values
to variables, but at any one time an atom may have at most one
of these value-related indicators⊗↓Several
implementations allow atoms to have several of these system properties.
Thus expressions like %3car[car]%* are executable.
The implementation uses context to determine which %3car%1 is a simple variable
and which %3car%1 is the primitive procedure.
The current %3eval%* %2will%* operate correctly on this example since
a recognizer for the function %3car%* is explicitly encoded in %3apply%*,
but such tricks lead to unnecessarily
mysterious programs. Context is used by an evaluator in slightly
less obnoxious ways. For example,
an evaluator for %3prog%* can
tell a reference to the label %3x%* from the %3prog%*-variable %3x%* in
%3#...#prog[[x;y]#..#x#f[..]#...#g[x#..#]#...#go[x].%* See {yon(P83)}.
For
the same reason, the LISP obscenity %3λ[[lambda] ... ]%* will work.
Notice its S-expr representation is %3(LAMBDA#(LAMBDA)# ...#)%*.←.
.GROUP;
For example, %3car%1 might be represented as:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
⊂ααααααπααα⊃
~ SUBR ~ #αβ→∞1To machine language code for ∞3car∞b
%αααααα∀ααα$
.END
.BEGIN CENTER;
%2Part of the atom-structure for %3CAR%1
.END
.apart
When we use %3car%* as a function
the machine-dependent code will be executed. When we use the
atom %3CAR%1 we access the representation %bCAR%1.
Notice that each indicator is atomic, so when we write %bSUBR%1 we actually
mean a pointer to the representation of the atom %3SUBR%1. Every use of
an atom is a reference to its atomic structure.
.BEGIN CENTERIT;GROUP;
.P29:
As a further example, we might define:
←%3fact <= λ[[x][x=0 → 1; %et%* → times[x;fact[sub1[x]]]]]
%1The S-expr translation of the right-hand side would be:
←%3(LAMBDA (X) (COND ((ZEROP X) 1) (T (TIMES X (FACT (SUB1 X))))))
.END
To represent the intention that %3fact%* is to be defined as
the above recursive function, we might store the S-expr representation
on the property-list of the atom %3FACT%* and use %3EXPR%*
as its indicator. The occurrence of the atom %3FACT%1 in the λ-expression
is represented as a pointer to the atomic structure of %bFACT%1.
.GROUP
Here is part of the atom-structure for %3FACT%*:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂ααααααπαααα⊃
FACT ~ EXPR ~ #αβ→⊃
⊂αα→%αααααα∀αααα$ ~
↑ ⊂←αααα←ααααα←αα$
~ %→⊂ααααααααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
~ ~ LAMBDA ~ #αβαα→~ # ~ #αβα→~ # ~≤'~
~ %αααααααα∀ααα$ %αβα∀ααα$ %αβα∀αα$
↑ ~ ↓
~ ⊂αααααααααα$ ⊂ααααααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
~ ↓ ~ COND ~ #αβ→~ # ~ #αβ→~ # ~≤'~
~ ⊂αααπαα⊃ %αααααα∀ααα$ %αβα∀ααα$ %αβα∀αα$
↑ ~ X ~≤'~ ↓ ↓
~ %ααα∀αα$ ... ...
~ ↓
~ ⊂αααπααα⊃ ⊂αααπαα⊃
↑ ~ # ~ #αβ→~ # ~≤'~
~ %αβα∀ααα$ %αβα∀αα$
%←αααα←αααααα←αααααααα←αααααααα←αααααααα←ααααααα←$ ↓
⊂←αααααα←αααα$
↓
⊂ααααααπααα⊃ ⊂αααπαα⊃
~ SUB1 ~ #αβ→~ X ~≤'~
%αααααα∀ααα$ %ααα∀αα$
.END
.BEGIN CENTER;
%2Atom-structure for %3FACT%1
.END
.APART
Note that both instances of %bX%1 are actually pointers to the representation
of %3X%1, but the two representations of %3(X)%1 will typically be
distinct.
Also keep in mind that we are storing the data structure
%2representation%1 of the %3fact%* function thus
.BEGIN CENTERIT;
←%3(LAMBDA (X) (COND ((ZEROP X) 1) (T (TIMES X (FACT (SUB1 X))))))
.END
is a perfectly good list and therefore a data structure. If we attached it
to the indicator %3VALUE%* then we would have represented the list as the
%2simple%* value of %3fact%*.
Representations of special forms like %3COND%* and %3QUOTE%* give rise to the
indicator %3FSUBR%*.
When an instance of such a special form is
recognized, the argument list is passed to the primitive without
any evaluation.
In a similar manner we introduce the indicator %3FEXPR%1 to designate
the occurrence of a "<%4f%1=" definition.
Our discussion of property lists as carriers of values has centered
on the representations of constants. Many identifiers in LISP
tend to be constants; even though we can redefine functions using
a version of "<=", most such definitions are relatively constant⊗↓To
define a temporary function we use %3label%1.←. The primitve functions
are also constant. Shallow binding tries to capitalize of this
observation, by associating all of the value aspects of a variable
with the property list, and we will soon see that for several interesting
subsets of LISP, we can significantly simplify the handling of shallow binding.
Before discussing that we will
show how an evaluator might use such property-list information. That
requires introduction of property-list manipulating functions.
.SS(Property-list functions)
.BEGIN INDENT 0,13;GROUP
.P142:
There are four functions for manipulating the property-list:
.p52:
%3putprop[a;v;p]:%*
⊗→%3putprop%*↔← will put the value %3v%* under the property %3p%* on the
property-list
of the atom %3a%*. If the property %3p%* already appears on the p-list then the
%3v%* over-writes the old value; otherwise a new property-value pair is added
to the front of the p-list of %3a%*. The value returned by %3putprop%* is %3v%*.
.END
.BEGIN INDENT 0,8;GROUP
%3get[x;i]: %*
⊗→%3get%*↔← will search the property-list of the atom %3x%* looking for the
indicator %3i%*. If %3i%* is found the value associated with that indicator is
returned by %3get%*.
If %3x%* does not have the
indicator then %ef%* is returned.
Thus %3getval[x]%1 could be defined as %3get[x;VAR].
.END
.BEGIN INDENT 0,8;GROUP
%3getl[x;l]:%*
%3l%* is a list of indicators.
⊗→%3getl%*↔←
will search the property-list of the atom %3x%* for the %2first%* occurrence
of any indicator which appears in %3l%*. If such a match
is found, then the %2remainder%* of the p-list, beginning with the
matching indicator, is returned. If no match is found, %ef%* is
returned.
The virtue of %3getl%* is that it can distinguish between an atom which has
an indicator with value %ef%* and an atom which does not have the indicator.
%3get%* cannot communicate this distinction.
The disadvantage of %3getl%1 is that it gives access to the internal structure,
and therefore the representation, of the atom. Such p-list functions are useful
but dangerous (⊗↑[Sam#75]↑).
.END
.BEGIN INDENT 0,12;
%3remprop[n;p]%1: The final function in this class is used to remove
property-value pairs
from the p-list of an atom. The function is named ⊗→%3remprop%*↔←.
%3remprop%*
has two arguments: %3n%*, an atom; and %3p%*, a property. If the property
is found on the
p-list of the atom, then %3remprop%* removes the property-value pair and returns
%et%*; otherwise it returns %ef%*.
.END
.SS(An %3eval%1 for p-lists)
The evaluators in this section illustrate the use of property lists and introduce the first
non-trivial application of LISP's ability to interchange program with data.
Though this chapter is mostly concerned with the %2static%1 organization of
LISP, the ideas involved in the evaluator are sufficiently important and
demonstrative of the power of property lists that we include them here
rather than later.
The first evaluator uses property names like %3CONST%1 and %3EXPR%1 as names of functions.
Discovering that an atom has the %3CONST%1 property, the evaluator %2applies%1
the function named %3const%1 to perform the evaluation. In this case,
the evaluation is simple: return the represented constant.
We will assume a shallow binding implementation and therefore
variables are handled by recognizing the property %3VAR%1 and
passing the evaluation to the function %3var%1. In the case of an
application, we have more work to do; %3expr%1 handles that.
The actual application, is handled by using LISP's computed function
facility discussed on {yon(P227)}.
We will describe a sequence of evaluators based on property list manipulation.
We will not express all of the details of this family of evaluators, but
leave many of the details to the reader.
.BEGIN SELECT 3;GROUP;TABIT1(18);
eval <= λ[[exp;env]\[atom[exp] → name[first[getl[exp;(VAR CONST)]]] [exp;env];
\ atom[first[exp]] → name[first[getl[first[exp];(EXPR FEXPR)]]] [exp;env];
\ ... ]]
.END
.BEGIN SELECT 3;GROUP;TABIT2(18,28);
var <= λ[[form;env] lookup[form;env]]
const <= λ[[form;env] denote[form;env]]
expr <= λ[[form;env]\λ[[def][eval[\body[def];
\\mkenv[vars[def];evlist[args[form];env]] ]
\ [get[func[form];EXPR]]
.END
It's not clear yet that any substantial benefit has been gained.
With a slight change, we could extract a small improvement. Replace
the explicit lists %3(VAR#CONST)%1 and %3(EXPR#FEXPR)%1 with
%3idprop%1 and %3appprop%1. Bind these variables globally to the
respective lists. Then if we wished to define a new kind of calling
sequence, say %3gexpr%1s, we could add %3GEXPR%1 to %3appprop%1 and
write a function named %3gexpr%1 to handle the evaluation
of %3gexpr%1 forms. However with further analysis, we can do much better.
Consider simple variables. Each instance of a simple variable has the
same value property; when we see %3x%1 we apply %3lookup%1 through %3var%1;
when we see %3y%1 we apply %3lookup%1 through %3var%1. However
the association of a value property with each %2instance%1 of a variable
is discomforting. The value property is more a property of the class of variables
than it is a property of an instance.
That is,
an instance receives a property by being a member of a certain class.
Let the atom %3VAR%1 represent the class of variables; let the atom
%3EVAL%1 represent the property name describing value properties.
The function %3lookup%1 is therefore a property value of the atom
%3VAR%1, and should be associated with the property name %3EVAL%1.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
X ⊂αααααααπαααααα⊃ ⊂ααααπααα⊃ VAR ⊂ααααααπαααααααα⊃
~ VAR ~ #αβα→αα→~ E4 ~ 3 ~ ~ EVAL ~ LOOKUP ~
εαααααααβααααααλ εααααβαααλ %αααααα∀αααααααα$
~ prop1 ~ val1 ~ ~ E3 ~ A ~
εαααααααβααααααλ εααααβαααλ
# # # ~ E1 ~ B ~
εαααααααβααααααλ %αααα∀ααα$
~ propn ~ valn ~
%ααααααα∀αααααα$
.END
Now the evaluator should do a double %3get%1, looking down the property list of
an object, for a class name which has an %3EVAL%1 property.
Finding %3EVAL%1, the evaluator applies the associated property value to the
object and the environment.
Before presenting the next evaluator we should settle one more point.
In the preceeding %3eval%1 we ignored the question of anonymous λ-expressions;
we assumed that the function-part of an application was an atom. We
did this becaue we have implied that only atoms have property lists. We will
remove this restriction for the next evaluator and assume that any object
can have a property list. A λ-expression will have a property list with
(at least) property name %3LAMBDA%1 and property value of the repesentation
of the λ-expression. The atom %3LAMBDA%1 will have an %3EVAL%1 property
whose value is the function %3eval_λ%1; this function will evaluate
applications whose function-parts is a λ-expression.
Finally, since %3eval_λ%1 handles most of the evaluation of an application,
there is no need to make %3expr%1 do it too. In fact, our previous distinction
between %3VAR%1 and %3EXPR%1 is unnecessarily restrictive. We should
be able to return functions as values just as we can return constants
or simple values.
So %3EXPR%1 should be a property name, with property value being
the λ-expression, but %3EXPR%1 should now have an %3EVAL%1 property
which is just %3lookup%1.
.GROUP
For example:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
plist for FACT plist for (LAMBDA (X) (COND ((ZEROP X) ... )))
⊂ααααααπααα⊃ ⊂ααααααααπααα⊃
~ EXPR ~ #αβ→ααααα→~ LAMBDA ~ #αβ→αα→ (LAMBDA (X) (COND ((ZEROP X) ...
εααααααβαααλ εααααααααβαααλ
# # # # # #
%αααααα∀ααα$ %αααααααα∀ααα$
plist for EXPR plist for LAMBDA
⊂ααααααπαααααααα⊃ ⊂ααααααπααααααααα⊃
~ EVAL ~ LOOKUP ~ ~ EVAL ~ EVALLAM ~
εααααααβααααααααλ εααααααβαααααααααλ
# # # # # #
%αααααα∀αααααααα$ %αααααα∀ααααααααα$
.END
.APART
With all this extra mechanism in place, %3eval%1 does absolutely nothing!
.BEGIN SELECT 3;GROUP;CENTER;
eval <= λ[[exp;env] getget[exp;EVAL] [exp;env]],
.END
where %3getget%1 looks at property names associated with %3exp%1
until it finds one which itself has a property list containing %3EVAL%1.
Now real progress has been made.
The evaluation of any expression is controlled by a function associated
with the class to which that expression belongs. It is trivial to modify
or extend such an evaluator: supply the class name and the appropriate
%3EVAL%1 property value.
The techinque is applicable to more general kinds of computation than just
evaluation. With a class name we can associate arbitrary pairs of properties
and functions. For example, we might wish to define special input or output
conventions for classes; to do this we simply associate a %3READ%1 property
and a %3PRINT%1 property with the class name and supply routines to
perform the specail reading or prnting. Similarly, we can assocaiate
a compile property, and a function describing how to compile
instances of this construct⊗↓The language EL1 (⊗↑[EL1#74]↑)
incorporates a similar scheme,
however only five designated properties can be associated with any
class name. The techniques of syntax-directed input-output, developed
in {yonss(P105)}, are also applicable to such an evaluator.←.
The net effect of this reworking of evaluation is to expose a much more
general scheme for handling computation.
Such a distributed %3eval%1 is an example of a LISP technique
called data-driven programming (⊗↑[San#75a]↑)⊗↓The author first
saw this technique used in a non-trivial way
in ⊗↑[Dif#71]↑ in the Stanford LISP compiler←.
***DO DATA BASE EXAMPLE****
.SS(Representation of property-lists)
In discussing representations, we must keep the essential characteristics
of property lists well in mind. A property list is similar to a local
environment block; each property list is a sequence of names and
values. However a property list can grow and shrink dynamically, whereas
an environment block is created at a fixed size. Since we cannot
predict the size of the block, a natural representation is that of a list.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααααααπααααα⊃
~ prop1 ~ val1~
εαααααααβαααααλ
~ prop2 ~ val2~ ⊂αααααααπααα⊃ ⊂αααααπααα⊃ ⊂αααααααπααα⊃ ⊂αααααπαα⊃
# # # ~ prop1 ~ #αβα→~ val1~ #αβ→ ...→~ propn ~ #αβα→~ valn~≤'~
εαααααααβαααααλ %ααααααα∀ααα$ %ααααα∀ααα$ %ααααααα∀ααα$ %ααααα∀αα$
~ propn ~ valn~
%ααααααα∀ααααα$
∞2Property list Representation∞*
.END
The elements of the p-list are associated in pairs. The first element
of a pair is
the property, the next element is the property value.
But atoms
can't be represented as just any arbitrary list.
We must be able to recognize the occurrence of an atom so that we can
implement the predicate %3atom%*.
There are at least two alternatives. We might partition memory as we
began to do on {yon(P159)} with pointer space and atom space.
Then the test of atom-ness is a test on the location being referenced.
We might also preface the property list with an "atom#header" which will
signal the beginning of the atom. Here the test for atom-ness is a test on the
contents of the location being referenced.
In the first case we dedicate a section of memory for the storage of
atoms⊗↓We need not dedicate a whole section of the machine to "atom space".
Several techniques are available for "mapping" a conceptually
contiguous space onto a scattered memory.←;
in the
second case we require an extra memory reference.
A related efficiency consideration involves the use of property lists
by the LISP implementation. Since the evaluator will be making frequent
examination of the p-list, it is often useful to store the system-related
information in specific relative positions of the p-list. This will obviate
a search using %3get%1.
Using a separate "atom space", an atom would be represented by its property
list. In this case, property lists
need not be stored in pointer space. {Yonsec(P210)} examines some of these
techniques. In the text we will describe atoms using the "atom header"
since it makes it clearer in pictures.
Atoms will be special lists whose
%3car%*-part contains an
indicator used exclusively for the beginning of atoms.
We will use %b∃%* to designate the beginning of an atom.
The %3cdr%* of the representation of the atom is the representation of
the property list.
Such locations in pointer space containing %b∃%* in their left-half and
a pointer to a p-list in their right-half are called %2⊗→atom header↔←s%*.
.GROUP
For example, here is part of a representation
for the atom for %3car%*:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
.TABS 52,57;
⊂ααπαααα⊃ ⊂ααααααπαααα⊃ ⊂αααπααα⊃
~∃ ~ #αβαα→~ SUBR ~ #αβαα→~ # ~ # ...→
%αα∀αααα$ %αααααα∀αααα$ %αβα∀ααα$
~
%→αα→ to machine code for ∞3car∞*
.END
.P50:
.BEGIN CENTER;
%2Part of the atom-structure for %3CAR%1
.END
.apart
.GROUP
An example of the distinction between %3get%1 and %3getl%1 in our
representation may be useful. Notice how both functions allow
unnecessarily free access to the internal representation.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
∞3getl[FOO;(BAZ)]∞*
⊗
∞3getl[FOO;(PNARF BAZ)]∞* ∞3get[FOO;PNARF]∞*
↓ ↓
⊂ααα←ααααααα$ ~
⊂απααα⊃ ↓ ⊂αααααπααα⊃ ⊂αααααπααα⊃ ⊂αααααααπααα⊃ ⊂αααπαα⊃ ~
~∃~ #αβαα→~ BAZ ~ #αβα→~ NIL ~ #αβα→~ PNARF ~ #αβα→~ # ~≤'~ ↓
%α∀ααα$ %ααααα∀ααα$ %ααααα∀ααα$ %ααααααα∀ααα$ %αβα∀αα$ ~
ε←ααα←ααα$
↓
⊂αααπαα⊃
~ A ~≤'~
%ααα∀αα$
.END
and %3get[FOO;BAZ] = get[FOO;BAR] = NIL%*.
.APART
The simple atom is becoming much more complex. It has a whole substructure
attached to it. Thus each atom is like a word in a dictionary; many words can
be used as different parts of speech and their dictionary entries will
reflect this by having several alternative meanings. Similarly,
an atom can have several different "meanings" attached to it
and depending on the context, we will be interested in one of those interpretations.
Just as we will find all meanings of a specific word in one location in the
dictionary, our implementation of LISP
becomes much simpler if we store each atom and its associated p-list uniquely.
For example,
every reference to the atom %3A%* is actually a pointer to the same
location in memory. This location has a %3car%*-part which is the special
atom indicator %b∃%*,
and a %3cdr%*-part which is the ⊗→p-list↔← for the atom %3A%*.
.GROUP
Thus %3(A . A)%*, which we have been representing as:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααααπααααα⊃
~ A ~ A ~
%ααααα∀ααααα$
.END
in more detail, is represented as:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααααπααααα⊃
~ # ~ # ~
%ααβαα∀ααβαα$
~ ~
~ ~ ⊂απααα⊃
%ααααα∀α→~∃~ #αβα→ ∞1p-list for ∞3A∞b.
%α∀ααα$
.END
.APART
.P48:
The internal structures of this implementation of LISP are %2not%* L-trees,
but list structure;
there are intersecting and circular branches.
LISP deals with binary list structure since each non-terminal node in our
representation has exactly two branches.
Assume we have the above dotted pair as a value of a variable %3x%*
and we wish to print the value of %3x%*.
Clearly we would hope to see "%3(A . A)%*" appear on the output device.
We would expect that the print routine, named %3print%*, would be given
a pointer to the dotted pair. %3print%* can recognize that it %2is%* a dotted
pair since its %3car%* is not %B∃%*.
But how can %3print%* distinguish %3(A . A)%* from %3(B . B)%* for example?
The pointer in the preceding diagram will point to %3A%* in one case
and to %3B%* in the other, but nothing on the p-list of the atom tells us %2what%*
to print. The simplest thing to do is to store a representation of the name
on each p-list.
This is done with another indicator called ⊗→%3PNAME%*↔←, standing for
%2⊗→print-name↔←%*. Each atom is guaranteed to have a print-name or "p-name".
The print-name
of the atom is what the LISP output routine will print as the name of
the atom.
.GROUP
The atom %3BAZ%* will have at least the following:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααααααπααααααα⊃
~ PNAME ~ "BAZ" ~
%ααααααα∀ααααααα$
.END
Where %b"BAZ"%1 means a representation of the string of characters, %3B,#A,#Z%1.
.APART
.P216:
When we wish to %2represent%1 such a property pair we must deal with
representational problems of character strings.
The salient properties here are
the desire to have strings of unbounded length, but represent them in a machine
with fixed word size. We will discuss a more general representation
in {yonss(P27)}, but here we will represent the print name by using the
basic dotted-pair data structure of LISP.
.GROUP
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααπαα⊃
~ # ~≤'~
%αβα∀αα$
↓
⊂ααααααα⊃
~ BAZ≡≡ ~
%ααααααα$
.END
←%2p-name representation for %3BAZ%1.
.APART
%bBAZ≡≡%1 means a memory location
containing some encoding of the letters %bB%*, %bA%*, and %bZ%*.
The symbol, %b≡%*, represents some non-printing character; thus in the above
diagram, we
are assuming a location can contain five characters.
.GROUP
We represent the print-name as a list so that we may allow atoms with
p-names of unbounded length. Thus the p-name for %3BLETCH%*,
would be:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααπααα⊃ ⊂αααπαα⊃
~ # ~ #αβαααα→~ # ~≤'~
%αβα∀ααα$ %αβα∀αα$
↓ ↓
⊂ααααααα⊃ ⊂ααααααα⊃
~ BLETC ~ ~ H≡≡≡≡ ~
%ααααααα$ %ααααααα$
.END
←%2P-name structure for %3BLETCH%1.
.APART
With such print-names on each property-list
%3print%* can now function: it will search the p-list for the indicator %3PNAME%*
and print what it finds. For the details of LISP output see {yonss(P13)}.
The %3print%1 routine needs the print name and, as we shall see shortly, the
input routine needs the print name, but otherwise all LISP calculation is
done using the internal pointers to the representation of the property list.
Several implementations exploit this observation by placing the print#names
in slower memory than that used for the main computation. Since
access to print names is infrequent, we can afford to spend more time
in retrieving them.
Notice that the actual print-names %bBAZ≡≡%*, %bBLETC%*, and %bH≡≡≡≡%*, are
candidates for storage in atom space since these encodings should not
be interpreted as pointers.
Since "atom space" is no longer an appropriate descriptor, we will
give it a new name.
We will call that area
of memory which contains information %2not%* to be interpreted as
pointers, %2⊗→Full Word Space↔←%*, and abbreviate it as ⊗→FWS↔←.
In summary, we have discussed the details of a typical implementation of the
class of S-exprs. Our non-atomic S-exprs have their branching structure
stored in what we have called pointer space. Our initial discussion
of atoms supposed a particular simple representation: simply store the
encoding of the atom in a memory location found in a separate space which we called
atom space. Upon further reflection we decided that atoms should play
a more active role in the implementation. Since identifiers are to be represented as
atoms we needed some way to represent those properties typically associated with
identifiers. Identifiers in LISP are, among other things, used for names of
functions and names of variables. Thus we needed the ability
to represent at least these two kinds of "values" with a LISP atom.
We introduced the general scheme of property-lists and associated
such a p-list with each LISP atom. All the things we know about a specific
atom are stored on the p-list. Thus we want each atom stored uniquely;
then anyone who wants to examine the properties of an atom has
only to look at the p-list.
Since all of our LISP programs must be read into the memory we let the
input function keep track of the details.
On reading
a literal atom,
the program checks the current table of atoms.
If the atom appears, the program returns a pointer to the entry.
If the atom does
not appear it constructs a new table entry consisting of the print name.
The effect of storing atoms uniquely was to turn
our abstract LISP-trees into list structure.
Indeed, the representation of a LISP expression is a
complex net of pointers; even atoms are now pointers. The only
LISP objects we have represented which are %2not%* pointers are the actual print-names
like %bBAZ≡≡%*.
To reinforce our discussion we illustrate
the abstract picture of %3NIL%1 and, on the next page, the
representation of the atom %3NIL%*. In all of the resulting worms
there are only three elements in Full Word Space; everything else is a
pointer.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααααα→αααααα→ααααααα→αααααα→ααααααα⊃
↑ ↓
⊂ααα←ααααα←ααααα⊃ ~
↓ ↑ ~ ⊂α←α⊃ ↓
~ ⊂αβαπααααα⊃ ↑ ↓ ⊂αβαπααααααα⊃ ⊂αααπααααααα⊃
%αα→~ # ~ #α→αβ→$ ε→~ # ~ PNAME ~ ~ # ~ CONST ~
εαααβαααααλ ↑ %ααα∀ααααααα$ %αβα∀ααααααα$
~ # ~ NIL ~ %←αααααα←αααααα←ααααα$
%αβα∀ααααα$ ↑
%→αααααα→αααααα→$
.END
.<<pic of NIL>>
.SKIP TO COLUMN 1;
.SS(A picture of the atom %3NIL%*,%3NIL%*)
.P141:
We have been writing the atom %3NIL%* as:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂απααα⊃ ⊂αααααααπααα⊃ ⊂αααπααα⊃ ⊂αααααααπααα⊃ ⊂αααπαα⊃
~∃~ #αβα→~ CONST ~ #αβα→~ # ~ #αβα→~ PNAME ~ #αβα→~ # ~≤'~
%α∀ααα$ %ααααααα∀ααα$ %αβα∀ααα$ %ααααααα∀ααα$ %αβα∀αα$
↓ ~ ⊂αααπαα⊃
NIL %α→~ # ~≤'~
%αβα∀αα$
↓
⊂ααααα⊃
~NIL≡≡~
%ααααα$
.END
where the atoms for %3PNAME%* and %3CONST%* are represented as:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂απααα⊃ ⊂αααααααπααα⊃ ⊂αααπαα⊃ ⊂απααα⊃ ⊂αααααααπααα⊃ ⊂αααπαα⊃
~∃~ #αβα→~ PNAME ~ #αβα→~ # ~≤'~ ~∃~ #αβα→~ PNAME ~ #αβα→~ # ~≤'~
%α∀ααα$ %ααααααα∀ααα$ %αβα∀αα$ %α∀ααα$ %ααααααα∀ααα$ %αβα∀αα$
↓ ↓
⊂αααπαα⊃ ⊂αααπαα⊃
~ # ~≤'~ ~ # ~≤'~
%αβα∀αα$ %αβα∀αα$
↓ ↓
⊂ααααα⊃ ⊂ααααα⊃
~PNAME~ ~CONST~
%ααααα$ %ααααα$
.END
In full detail, %3NIL%1 is represented as:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂ααααααααααααααααααα←ααααααααααααααααααααααααααα←ααααααπααααπααααααααπααα⊃
↓ ↑ ~ ↑ ↑
⊂απααα⊃ ⊂αααααααπααα⊃ ⊂αααπααα⊃ ⊂αααααααπααα⊃ ⊂αααπαβα⊃ ↑ ~ ~
~∃~ #αβα→~ # ~ #αβα→~ # ~ #αβα→~ # ~ #αβα→~ # ~ # ~ ~ ~ ~
%α∀ααα$ %αααβααα∀ααα$ %αβα∀ααα$ %αααβααα∀ααα$ %αβα∀ααα$ ~ ~ ~
↑ %αααααααααα→ ~ →αααααααα→ ~→αα⊃ ~ ↑ ~ ~
~ ↓ ~ ~ ~ ⊂αααπαβα⊃ ~ ~
~ ~ ↓ ↓ %α→~ # ~ # ~ ~ ~
~ ↓ ~ ~ %αβα∀ααα$ ~ ~
~ ~ ~ ~ ↓ ↑ ↑
%ααααααα←ααααααααα←αααααα∀αααπαααα⊃ ~ ~ ⊂ααααα⊃ ~ ~
↑ ↑ ~ ~ ~NIL≡≡~ ~ ~
~ ~ ↓ ~ %ααααα$ ~ ~
⊂αααααααααα←ααααααααααααααα← ~ ←α ~ ←α$ ~ ~ ~
↓ ~ ~ ↓ ~ ~
⊂απααα⊃ ⊂αααααααπααα⊃ ⊂αααπαβα⊃ ~ ⊂απααα⊃ ⊂αααααααπααα⊃ ⊂αααπαβα⊃ ↑
~∃~ #αβα→~ # ~ #αβα→~ # ~ # ~ ↑ ~∃~ #αβα→~ # ~ #αβα→~ # ~ # ~ ~
%α∀ααα$ %αααβααα∀ααα$ %αβα∀ααα$ ~ %α∀ααα$ %αααβααα∀ααα$ %αβα∀ααα$ ~
↑ ~ ↓ ~ ~ ↓ ~
εααααα←ααααα$ ⊂αααπααα⊃ ↑ ↓ ⊂αααπααα⊃↑
~ ~ # ~ #αβα$ ~ ~ # ~ #αβ$
~ %αβα∀ααα$ ~ %αβα∀ααα$
~ ↓ ~ ↓
↑ ⊂ααααα⊃ ~ ⊂ααααα⊃
~ ~PNAME~ ↓ ~CONST~
~ %ααααα$ ~ %ααααα$
%αααααααα←ααααααααααααα←ααααααααααααα←αααααααααααααα$
.END
.SS(Input/Output: %3read%* and %3print%*,,P13:)
.TURN ON "←";
.BEGIN "FOO";TURN OFF "α";
When you begin to implement LISP you find that a very large
part of LISP can be written in LISP. We have already seen that the
evaluation process is expressible this way.
Here we will see that the majority of the I/O routines can be
written as LISP functions calling a very few primitive routines.
The primitive routines can also be described in `pseudo-LISP'.
That is, we will give a LISP-like description of them, though
they would normally be coded in machine language.
The primitive functions are ⊗→%3ratom%*↔← and ⊗→%3prin1%*↔←.
.BEGIN INDENT 0,7;
.GROUP
%3ratom[ ]%* is a function of no arguments. It reads the input string, constructing
the next atom
or special character (left paren, right paren or dot).
It looks up that object in the atom table and returns
a pointer to that table entry. If no entry
is found an appropriate entry is made.
%3ratom%* flushes spaces and commas,
recognizing them as delimiters. It returns only atoms or characters special
to %3read%*.
.APART
.GROUP
%3patom[x]%* is a function of one argument expecting an atom, left paren,
right paren, blank, or dot as the value of its argument.
It will print the p-name of that object on the output
device.
.APART
.END
%3ratom%* is the LISP %2⊗→scanner↔←%*. A scanner is a basic part of
an input processor. Typically
a scanner's job is to negotiate with the actual input device for input
characters. The scanner builds the most basic ingredients, like identifiers
from alphanumeric strings, or numbers from digit strings, and only after
such a basic block has been recognized is the next level of syntax
analysis attempted.
The units, also called tokens, which the scanner has built are passed to the
%2⊗→parser↔←%*.
The typical parser builds a tree-representation of the input string;
the structure of the tree reflects the language constructs which were present in the
input.
A well-designed parser uses the BNF equations which
describe the class of well-formed input expressions to determine whether or not
the input stream is a legal expression.
%3read%* is the LISP ⊗→parser↔←.
%3read%*
will recognize both S-expression and list-notation;
besides analyzing the input stream, %3read%*
also builds an S-expression representation of the input.
To make life simpler we need to be able to refer to atoms whose print-names are
the characters "%3)%*", "%3(%*", ".", and " "#(blank).
We will assume that ⊗→%3RPAR%*↔←, ⊗→%3LPAR%*↔←,
⊗→%3PERIOD%*↔←, and ⊗→%3BLANK%*↔← denote such atoms. For example, if the next input
character is "(" then
←%3eq[ratom[ ];LPAR]%* is true (and the input pointer is moved to
the next character!).
%3patom[PERIOD]%* will have the effect of printing a %2"."%* on the output
device.
We will discuss the structure of %3ratom%* and %3patom%* in a moment. In
the meantime here is %3read%*:
.BEGIN TABIT2(17,20);TURN OFF"←";SELECT 3;
.GROUP
⊗→%3read%*↔← <=λ[[ ]prog[[j]
\j ← ratom[ ];
\[atom[j] →return[j];
\ is_lpar[j] → return[read_head[ ]];
\ is_dot[j] → err[ ];
\ is_rpar[j] → err[ ] ]]]
.APART
.BEGIN FILL;select 1;
%3read%* will return a representation of a legal S-expr or list, %9α%1.
%3err%1 is a LISP system function which, in this case, will terminate the input scanning
immediately.
%3read_head%* will translate strings %9α%* acceptable in the context "%3(%9α%1".
Thus %9α%* being %3"A)"%* or %3"A#(B#.#C))"%* would be suitable for %3read_head%*;
%3(A)%* and %3(A#(B#.#C))%* are S-exprs or lists.
%3.#A)%1 would not be acceptable since %3(.#A)%1 is not an S-expr or list.
.END
.GROUP
⊗→%3read_head%*↔← <= λ[[ ]prog[[j]
\\j ← ratom[ ];
\\[atom[j] → return[cons[j;read_tail[]]];
\\ is_lpar[j] → return[cons[read_head[ ];read_tail[ ]]];
\\ is_dot[j] → err[ ];
\\ is_rpar[j] → return[NIL] ]]]
.APART
.GROUP;
.BEGIN FILL;SELECT 1;
%3read_head%* is looking for legal %9α%*'s in the context "%3(%9α%1".
If it sees:
.END
.BEGIN NOFILL;SELECT 1;
\an atom, then %9α%1 is <atom>%9β%3)%1;
\a left parenthesis, then %9α%1 is %3(%9β%3)%9∂%3)%1;
\a dot, then %9α%1 is %3.%9β%3)%1;
\a right parenthesis, then %9α%1 is %3)%1.
.END
.APART
.GROUP
⊗→%3read_tail%*↔← <= λ[[ ]prog[[j]
\\j ← ratom[ ];
\\[atom[j] → return[cons[j;read_tail[]]];
\\ is_lpar[j] → return[cons[read_head[ ];read_tail[ ]]];
\\ is_dot[j] → return[read_cdr[]];
\\ is_rpar[j] → return[NIL] ]]]
.APART
.GROUP;
.BEGIN FILL;SELECT 1;
%3read_tail%* is looking for legal %9α%*'s in the context "%3(%1<sexpr>#%9α%1".
The structure of this function is that of %3read_head%* except for recognition
of dots. "%3.%9β%3)%1" %2is%1 plausible in the context "%3(%1<sexpr>#%9α%1".
It is up to %3read_cdr%* to see if its expectations are fulfilled.
.END
.APART
.GROUP;
%3read_cdr <= λ[[ ] prog[[j]
\\j ← read[ ];
\\[is_rpar[ratom[ ]] → return [j];
\\ %et%* → err[ ] ]]]
.BEGIN FILL;SELECT 1;
That is, the only input legal after a dot is a S-expr or list followed by a right
parenthesis.
.END
.APART
.CENTER;
.SELECT 3;
is_dot[x] <= eq[x;PERIOD] is_lpar[x] <= eq[x;LPAR], ...%1 etc.
.END
Here are %3print%* and friends. %3terpri%* initiates a new output line.
(note: the value of %3print[x]%* is %3x%*.)
.BEGIN TABIT3(10,13,25);TURN OFF "←";SELECT 3;
.GROUP
⊗→%3print%*↔← <= λ[[x]prog[[ ]
\prin0[x];
\terpri[ ];
\return[x]]]
.APART
.GROUP
⊗→%3prin0%*↔← <= λ[[x]prog[[j]
\\[atom[x] → return[patom[x]]];
\\j ← x;
\\patom[LPAR];
\a3\prin0[car[j]];
\\[null[cdr[j]] →\patom[RPAR];
\\\return[x]];
\\patom[BLANK];
\\[atom[cdr[j]] →\patom[PERIOD];
\\\patom[BLANK];
\\\patom[cdr[j]];
\\\patom[RPAR];
\\\return[x]];
\\j ← cdr[j];
\\go[a3] ]]
.BEGIN FILL;SELECT 1;
Notice that we have used the extended conditional expression as described
in {yonss(P67)}⊗↓Notice too that %3print[(A#.(B#.#C))]%* prints as
%3(A#B#.#C)%1. This is because %3print%1 doesn't know that the structure is
not a list until it sees the last dotted-pair. There are two ways of handling this:
either require a type-code, telling whether the structure is a dotted pair or
a list, represented as a dotted pair. Then %2all%1 dotted pairs are printed in dot
notation, and %2all%1 lists are printed in list notation. The other alternative
is to first examine the structure; if it is a list representation, then print it
that way, otherwise print it as a dotted pair. This problem is another
indication of "object vs. representation".←.
.END
.APART
.END
The basic %3print%1 routine allows us to print data structures and program
representations; however the format of such output is a simple linear string
of atoms, numbers, spaces, and parens.
.GROUP;
For example a %3print%1-based
program for printing function definitions might print the following as the
definition of %3member%1:
.BEGIN FILL;SELECT 3;INDENT 20,20,20;
(MEMBER (LAMBDA (X L) (COND ((NULL L) NIL) ((EQ X (FIRST L)) T) (T (MEMBER X (REST L))))))
.END
.APART
The print routine can break the text at the end of any atom⊗↓Some implementations
even allow the printer to break in the middle of an atom. This is accomplished
by designating a special character for carriage control, and the %3read%1 routine
knows to ignore the immediately following end-of-line sequence.←; the only
restriction we place on printing of expressions is that what is %3print%1-ed
must be %3read%1-able.
Even with a small definition like this, we have difficulty deciphering
the structure. When functions or lists become large and deeply nested
the readability becomes unmanageable. Most implementations of LISP supply
formatting programs called "pretty-printers" or "grinders" to supply indenting
to the basic print routine.
.GROUP;
Thus a pretty-printer might print %3member%1 as:
.BEGIN SELECT 3;TABIT1(10);
(MEMBER
(LAMBDA (X L)
(COND\((NULL L) NIL)
\((EQ X (FIRST L)) T)
\(T (MEMBER X (REST L))))))
.END
.APART
See {YONSS(P226)} for a more detailed description of such functions.
So far we have thrown all the I/O back on %3ratom%* and %3patom%*. Clearly
%3ratom%* will be more interesting. All %3patom%* need do is get the p-name and
print it. %3ratom%* needs to do an efficient search of the atom table and if
the atom is not found, add it to the table. All %3ratom%* has to work
with is the actual character string which
will be the p-name of some atom. What %3ratom%* could do is look at the
p-name of each atom currently in the table of atoms; when it finds a match
it returns a pointer to that atom.
This is essentially the search scheme of %3assoc%*.
If the appropriate
atom is not found it can build a new one consisting of the p-name, add it
to the table, and return a pointer to this new entry.
In the next section we will introduce an alternative scheme called hashing.
.CENT(Problems)
%21.%* Let %3prin_fw[x]%* be a primitive printing function where %3x%* is to
be an actual element in Full Word Space. %3prin_fw%* is to print the
character string on the output device. You may assume it knows about
%b≡%*. Write %3patom%* using %3prin_fw%*.
%22.%* You might have noticed that the definitions of %3read_head%* and
%3read_tail%*
are almost identical: the difference involves treatment of dots.
Write new versions of these functions utilizing a common routine and functional arguments.
%23.%* Write the set of BNF equations that drive %3read%*.
.SS(Table searching: Hashing,hashing,P14:)
Table lookup is analogous to the problem of looking up
words in a dictionary. The scheme of %3assoc%*
is analogous to a person beginning at page one of the dictionary and
proceeding linearly (page-by-page and word-by-word) through
the book until he finds the word in question.
What we normally do is look at the first character of
the word and go immediately to the subsection of the dictionary which
has the words beginning with that character. We know that if
we cannot find the definition of our word in that subsection we need
look no further in the book. Usually we delimit our search even
further by keying on subsequent characters in the word. Finally
we may resort to linear search to pin down the word on a specific
page or column.
What we want is a similar scheme for the machine. We might in fact
mimic the dictionary search, subdividing our table into 26 subsections.
We can do better.
Since it is the machine which will subdivide and
index into the table, we can pick a scheme which is computationally
convenient for the machine. Besides being convenient, the scheme should
result in rather even distribution of atoms in the subsections.
If the majority of the atoms end up in the same partition
of the table we will have gained little towards improving the
search efficiency.
Each of the partitions
or subdivisions of the table is called a %2⊗→bucket↔←%*. Each atom will
appear in at most one bucket. The computational scheme or function
used to determine which bucket a particular atom belongs in is called
a %2⊗→hashing function↔←%*. All ⊗→%3ratom%*↔← has to work with is %3chr-str%*, the
encoding of the actual name of the atom.
The hashing function
will use %3chr-str%* to determine which bucket to examine.
Given the bucket number,
we then run down the list of atoms in that bucket, comparing
print-names against %3chr-str%*.
If the atom
with print-name %3chr-str%* does not appear in that bucket we are assured
that it does not appear anywhere in the table. In this case we
create a new atom structure and add it to that bucket.
Here is a sample hashing function:
%21.%* Assume that we have N+1 buckets, numbered 0, 1, 2 ... N.
%22.%* Take the representation of %3chr-str%* (it's a number) and
divide that number by N+1.
%23.%* Look at the remainder. It's a number between 0 and N.
%24.%* Use that remainder as the index to the appropriate bucket.
The LISP atom table, usually called ⊗→%3OBLIST%*↔← (for %2⊗→object list↔←%*),
is a list of buckets. Each bucket is a list of the atoms which `hash' to that bucket.
We actually represent the object list as an array named %3oblist%*.
Arrays are discussed in full in {yonss(P137)}, but basically are an efficient
storage representation for sequences of known length. We typically allocate
a block of sequential cells and use the addressing scheme of the machine's hardware
to do a rapid subscript calculation.
The hash number will give us the array subscript and we can
go to the right bucket immediately.
We won't have to go %3cdr%*-ing down the object list to get to the bucket.
.GROUP
.P150:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααπααα⊃
⊂ααααααα←ααααααα←αααα←βα# ~ #αβα⊃
~ εαααβαααλ ↓
~ ~ ~ ~←$
~ ⊂ααα←ααα←ααα←βα# ~ #αβ→⊃
↓ ↓ εαααβαααλ ↓
... (to bucket 2) ... ...
↓ ↓ εαααβαααλ ↓
~ ... ⊂α←αβα# ~ ≤'~←$
~ ↓ εαααβααα$
~ %ααααααα→αααααα→αααααα⊃
↓ ↓
(to bucket 1) (to bucket n)
~ ~
~ ~ ⊂αααπααα⊃ ⊂αααπαα⊃
~ ⊂αααπααα⊃ ⊂αααπαα⊃ %α→~ # ~ #αβ→ ...→~ # ~≤'~
%→~ # ~ #αβ→...→~ # ~≤'~ %αβα∀ααα$ %αβα∀αα$
%αβα∀ααα$ %αβα∀αα$ ↓ %→ ...
~ ~ ⊂απααα⊃
(to atom 1: (atom m: ~∃~ #αβ→(p-list of
bucket 1) bucket 1) %α∀ααα$ atom 1: bucket n)
~ ↓
↓ ⊂απααα⊃ ⊂αααααααπααα⊃ ⊂αααπααα⊃ ⊂ααααααπααα⊃ ⊂αααπαα⊃
... ~∃~ #αβα→~ PNAME ~ #αβα→~ # ~ #αβ→~ SUBR ~ #αβ→~ # ~≤'~
%α∀ααα$ %ααααααα∀ααα$ %αβα∀ααα$ %αααααα∀ααα$ %αβα∀αα$
↓ ↓
⊂αααπαα⊃ to primitive code
~ # ~≤'~ for ∞3car∞*
%αβα∀αα$
↓
⊂ααααα⊃
~CAR≡≡~
%ααααα$
.END
.BEGIN CENTERIT;SELECT 2;
←Partial Object List; where atom m:bucket 1 is %3CAR%*
.END
.APART
Note: The top level of %3OBLIST%* is stored sequentially for fast access
by the hasher; the %3cdr%*-parts are chained together in a (sequential)
list so that the structure of the table will be like any other list. The
chained representation is used by any other LISP process⊗↓In particular,
the garbage collector uses this linking. As a further implementation note,
the implementors of MacLISP noted the frequent use of single character atoms
and added a special section to the top-level of the object list. A contiguous
block of cells, of size equal to the number of characters, was added. On reading
a single character atom, the corresponding entry in the table is examined.
A %3NIL%1 says the atom hasn't been seen before; otherwise its p-list
representation resides there.←.
MacLISP embellishes the basic %3OBLIST%1 in an important way. That system will
allow several object lists to exist within the system. This is useful since several
cooperating LISP subsystems may exist; for example the LISP editor, debugger, and
compiler are all written in LISP and may all be used within the same interactive session.
The difficulty is that each of those subsystems may use names which conflict
with names in the user's programs. Multiple object lists are a way to overcome this
problem. One one object list is current at any time, but several may exist.
There are functions to create object lists, and object lists are swapped by
λ-binding to the identifier %3obarray%1⊗↓An object list is
called object array in MacLISP.←.
.GROUP;
Consider:
.BEGIN CENTER;SELECT 3;
λ[[obarray]%9x%3][ob%41%3].
.END
Assuming that %3ob%41%1 is bound to an object list,
then within the evaluation of %9x%1 the symbols and bindings of
%3ob%41%1 would be accessible.
.APART
Should a linear search and storage technique,
like %3assoc-pairlis%1,
or a more complex technique like hashing, be employed?
It depends on the application, and the speed and size of the machine.
The hash table takes extra space both for storage and for program, but
gives a faster search time. The linear technique requires less space, but
can be quite slow. ref ⊗↑[Ste#pc]↑, ⊗↑[Har#75]↑.
Several books cover searching and sorting in great detail (⊗↑[Gri#71]↑, ⊗↑[Knu#xx]↑).
Finally, we want to present a version of ⊗→%3ratom%*↔← which uses the organization
we just described.
In this description, we will restrict ourselves to recognition of atoms, leaving
the reader to supply the necessary parts for recognition of numbers.
We will recognize three classes of characters:
.BEGIN INDENT 0,10;GROUP;
%21.%1 The class of letters will include the alphabetic characters.
.END
.BEGIN INDENT 0,3;GROUP;
%22.%1 The class of delimiters consists of those characters which signal
the end of an atom. For this scanner we assume space and carriage control
are delimiters.
.END
.BEGIN INDENT 0,3;GROUP;
%23.%1 The special characters will consist of "(", ")" and "." .
.END
Special characters also act as delimiters in LISP and this results in a
slight complication. Consider the partial string "%3AB#)C%1".
Our scanner should scan the "%3A%1", scan the "%3B%1", and scanning the space,
should recognize a delimiter. It
should recognize the %3AB%1 as an atom, and signal %3read%1. The string
will be reduced to "%3)C%1". The next time %3read%1 calls %3ratom%1 the
right parenthesis will be seen, recognized as a special character and
an indication of that will be returned to %3read%1.
Now consider the string "%3AB)C%1"; %3ratom%1 will scan "%3A%1" and "%3B%1"
as before. It will then scan the ")". It now needs to do %2two%1 things;
it must signal %3read%1 about the atom it has seen, but it must also
remember the ")" so that the %2next%1
time %3read%1 asks for information, it sees the ")" and not the "%3C%1".
We handle this problem by using a global variable named %3lst_chr%1.
This variable is initialized to %3NIL%1 and remains that way until
our anomalous situation occurs. At that time the special character is
placed in %3lst_chr%1, and %3ratom%1 exits normally. So, whenever %3ratom%1
is called, the first thing it does is check the contents of %3lst_chr%1.
If it is non-empty, its contents is returned, as %3lst_chr%1 is set empty
again.
.BEGIN TURN OFF "←";SELECT 3;TABIT2(15,17);
.GROUP;
ratom <=λ[[]prog[[chr]
\\[lst_chr → swap[lst_chr;chr];return[chr]];
\a\chr ← readch[];
\\[is_let[chr] → stuf_buf[chr];ratom%41%3[];
\\ is_delim[chr] → go[a];
\\ is_spec[chr] → return[chr]]]]
.BEGIN FILL;SELECT 1;
This procedure uses tricks advertised in {yonss(P214)}, using
%3lst_chr%1 as a predicate, and knowing that %3prog%1 variables are
initialized to %3NIL%1 and that the representation of %ef%1 is %3NIL%1.
With that knowledge, %3swap%1 swaps the contents of %3chr%1 and %3lst_chr%1.
The routine %3readch%1 gets the next character, and %3stuf_buf%1
is used to save the character string which is to become an atom.
The character string is built up in %3buf%1.
.END
.APART
.GROUP
ratom%41%3 <= λ[[]prog[[chr]
\l\chr ← readch[];
\\[is_delim[chr] → return[intern[buf]];
\\ is_spec[chr] → lst_chr ← chr; return[intern[buf]];
\\ %et%3 → stufbuf[chr]; go[l]]]]
.BEGIN FILL;SELECT 1;
If %3ratom%41%1 sees a special character it is saved in %3lst_chr%1.
.END
.APART;
.GROUP;
intern <= λ[[l][prog[chr-str]
\\chr-str ← maknam[buf];
\\i ← hash[chr-str]
\\bucket ← oblist[i];
\a\[null[bucket] → return[insert[chr-str]]];
\\at ← get[first[bucket];PNAME];
\\[right-one[at;chr_str] → return[first[bucket]]];
\\bucket ← rest[bucket];
\\go[a]]]
.BEGIN SELECT 1;FILL;
%3maknam%1 takes our character string and converts it into an
appropriate numeric representation.
%3hash%* returns the bucket number of its argument, and
%3insert%* builds the atom and inserts it into a bucket.
%3right-one%* is a predicate used to check if an atom
has the right print-name.
.END
.END
.P157:
Typically an implementation of %3ratom%1 will be done so that
the class of special characters and delimiters can be varied. This is
done using a representation of a character table whose name entries
are characters, and whose value entries determine the %3ratom%1 properties.
This is done to allow LISP users to define their own parsers and scanners.
LISP's modifiable input routine, coupled with its data structures and
extendible evaluator make LISP an excellent tool for building more sophisticatd
language systems.
On {yon(P204)} we introduced the abbreviation %9`%3x%1 for %3quote[x]%1.
Such an abbreviational facility is present in several versions of LISP and
it is the duty of %3ratom%1 to recognize such constructs.
Whenever %3ratom%1 sees the prefix %9`%1 it
reads the next S-expr %9α%1 and returns the list %3(QUOTE %9α%3)%1 as
value. This %3quote%1 facility is an instance of a device called
a ⊗→%3read%1#macro↔←.
In some systems (⊗↑[Moo#76]↑) the users may define their own %3read%* macros.
For example a definition like:
.BEGIN CENTER;SELECT 3;
' <%4r%3= λ[[] list[QUOTE;read[]]]
.END
might signal LISP to change the character table for "'" to be a %3read%1#macro
and when an instance of "'" was recognized, the input string would be passed to
%3read%1, the resulting S-expr %3list%1-ed with %3QUOTE%1, and returned
to %3read%1.
MacLISP also defines a comment facility using a %3read%1#macro.
The occurrence of a semi-colon signals the beginning of a comment; all
characters to the end of the line are consider commentary.
Several implementations also include devices to decrease the number of parentheses
needed. For example "[" and "]" are often defined to be "super-parentheses".
The "[" acts like a "(" but its
scope runs to the next "]", constructing sufficient
")" to balance the intervening expression.
Similarly, the scope of a "]" extends to the prior matching "["; if none
exists, the expression is completed by supplying sufficient ")" to balance.
.GROUP;
For example:
.BEGIN CENTER;SELECT 3;
((A B) ((C D E))) = ((A B) ((C D E))] = [(A B) ((C D E))] = ((A B) ((C D E]
.END
.APART
.END "FOO";
Regardless of the specifics of the implementation, the input routines
will read in a list representation of a LISP expression and
convert it into an S-expr. For example,
let's see what happens if we want to evaluate
.BEGIN CENTER;SELECT 3;
eq[x;A].
.END
This will be presented to the machine as:
.P156:
.BEGIN CENTER; SELECT 3;
(EQ X (QUOTE A))
.END
.APART
.GROUP;
%3read%1 will begin parsing the sequence of characters; it will
depend on %3ratom%1 to return indications of the special characters, and
will depend on %3ratom%1 to properly represent each occurrence of an atom.
The parser knows about the representation we have chosen for lists
and will use %3cons%1 to build up the S-expression form:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂ααααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
~ EQ ~ #αβα→~ X ~ #αβα→~ # ~≤'~
%αααα∀ααα$ %ααα∀ααα$ %αβα∀αα$
~ ⊂αααααααπααα⊃ ⊂αααπαα⊃
%→~ QUOTE ~ #αβα→~ A ~≤'~
%ααααααα∀ααα$ %ααα∀αα$
.END
.APART
The references to the atoms
%3EQ, X, A,%1 and %3QUOTE%1 are actually pointers to the
atoms.
Each atom is located only once by the reader. After that we have direct access to
atom and its property list.
Since the input routines perform several %3cons%1 operations; we should look at
the details of %3cons%1.
.SS(A first look at %3cons%1)
The effect of the ⊗→%3cons%*↔← function is quite different from that
of the other LISP primitives. The other functions (or
predicates) manipulate existing S-expressions, whereas %3cons%*
must construct a new S-expression from two existing S-exprs.
That is, given representations of two S-exprs, say %3x%* and %3y, cons[x;y]%*
is to get a new cell, put the representation of %3x%* in the %3car%*-part of
the cell and the representation of %3y%* in the %3cdr%*-part and return a pointer
to the new cell:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
∞1result of ∞3cons[x;y]∞B
~
~ ⊂αααααπααααα⊃
%ααααααααα→~ # ~ # ~
%ααβαα∀ααβαα$
~ ~
⊂αααα$ %αααα⊃
~ ~
↓ ↓
∞1rep. of ∞3x ∞1rep. of ∞3y∞1
.APART
.END
When a computation
is begun, only the atom structure for the initial LISP symbol table
uses cells in the pointer#area. The remaining pointer#cells
are linked together and form the %2⊗→free space list↔←%* or FS list.
Whenever %3cons%* needs a cell, the first cell in the FS list is used and
the FS list is set to the %3rest%* of the FS list. For example the following
represents the effect of %3cons[A;B]%*:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
∞3env∞* ∞3dest∞*
↓ ↓
⊂ααπαα⊃ ⊂ααααα⊃
~ ~ #α→... ~ #αβα⊃
εααβααλ εααπααλ ~
~ ~ #β→⊃ ... ↓
εααβααλ ↓ ~ ~ ~←$
~ ~ #βαααα⊃ εααβααλ Pt∞4FS∞*
%αα∀αα$ ↓ ↓ ... ↓
~ ~ %αα∀αα$ ~
~ %→ααα→αα→⊃ ~
↓ ↓ ↓
⊂απααα⊃ ⊂απααα⊃ ⊂ααπααα⊃ ⊂ααπααα⊃ ⊂ααπαα⊃
~∃~ #αβ→... ~∃~ #αβ→ ... ~≤'~ #αβ→~≤'~ #αβ→ ...→~≤'~≤'~
%α∀ααα$ %α∀ααα$ %αα∀ααα$ %αα∀ααα$ %αα∀αα$
atom A atom B
.BEGIN CENTER
∞2Before∞1
.END
.END
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
∞3env∞* ∞3dest∞*
↓ ↓
⊂ααπαα⊃ ⊂ααααα⊃
~ ~ #α→... ~ #αβα⊃
εααβααλ εααπααλ ~
~ ~ #β→⊃ ... ←$
εααβααλ ↓ ~ ~ #ααα→ααααα→αααα→⊃
~ ~ #βαααα⊃ εααβααλ ↓ Pt∞4FS∞*
%αα∀αα$ ↓ ↓ ... ~ ↓
~ ~ %αα∀αα$ ~ %ααα→⊃
~ %→ααα→αα→⊃ ~ ~
↓ ↓ ↓ ↓
⊂απααα⊃ ⊂απααα⊃ ⊂αααπααα⊃ ⊂ααπααα⊃ ⊂ααπαα⊃
~∃~ #αβ→... ~∃~ #αβ→ ... ~ # ~ # ~ ~≤'~ #αβ→ ...→~≤'~≤'~
%α∀ααα$ %α∀ααα$ %αβα∀αβα$ %αα∀ααα$ %αα∀αα$
↑ ↑ ↓ ↓
%ααααα←ααααα←ααααα←ααααα←αααααα$ ~
↑ ↓
%←ααααα←ααααα←αααααα$
.BEGIN CENTER
∞2After∞1
.END
.END
As the computation continues, cells are taken from the FS list.
When a %3cons%* operation needs a cell and the FS list is empty, the
computation is suspended and the %2⊗→storage reclaimer↔←%* or %2⊗→garbage collector↔←%* is
called. The job of the garbage collector is to locate cells for a new
FS list.
.SS(Storage management: garbage collection,garbage collection,P146:)
During the course of a computation, contents of cells which were taken
from the FS list often become unnecessary. For example, if we ask LISP to evaluate
something as simple as:
←%3(CONS (QUOTE A)(QUOTE B)),%* many cells are used:
.GROUP;
%21.%* At least seven cells are needed just to read in the expression:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂ααααααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
~ CONS ~ #αβα→~ # ~ #αβα→~ # ~≤'~
%αααααα∀ααα$ %αβα∀ααα$ %αβα∀αα$
↓ ↓ ⊂αααααααπααα⊃ ⊂αααπαα⊃
~ %→~ QUOTE ~ #αβα→~ B ~≤'~
~ %ααααααα∀ααα$ %ααα∀αα$
~ ⊂αααααααπααα⊃ ⊂αααπαα⊃
%α→~ QUOTE ~ #αβα→~ A ~≤'~
%ααααααα∀ααα$ %ααα∀αα$
.END
.APART
If some of the atoms are not present in the atom table,
more cells will be needed.
%22.%* One cell will be needed to perform the %3cons%* operation. See
the previous example.
After the computation is completed, LISP will print "%3(A#.#B)%1"
and wait for more input. After the %3print%1 statement is completed
none of the eight mentioned cells are needed.
They are garbage.
Why not simply return these `garbage cells' explicitly to the FS list?
That could be done in this case, but
frequently it is difficult to know exactly which cells
%6are%* garbage. Experiments have been performed in which LISP
programmers were allowed to return `garbage' to the FS list themselves.
The results were disastrous; often list structure thought to be garbage
was returned to the FS list by the programmers, who were unaware it was still being
used by other computations. We will see where these difficulties arise
later.
Therefore the responsibility for
reclamation is passed to the LISP system. The %3cons%* function
removes cells from the FS list, and
its FWS counterpart %3fwcons%*,
removes cells from the FWS list when making numbers or print-names.
These two functions are the only functions allowed
to manipulate the free storage lists. When either list becomes empty, the
garbage collector is called.
%2←The fundamental assumption behind garbage collection is:%*
.NARROW 10,10;
At any point in a LISP computation, all cells which contain
parts of the computation are reachable (by %3car-cdr%* chains)
through a fixed set of known cells or base registers.
.WIDEN;
The first phase of the garbage collector, called the %2⊗→marking phase↔←%*,
marks all of the list structure which is currently active (reachable
through the above mentioned fixed cells). This should include all the
atoms in the atom table and all the cells reachable by %3car-cdr%* chains
from any of these atoms. Any partial computations which have been
generated must also be marked.
In terms of our discussion, we mark from the oblist, from %3dest%1,
from %3control%1 (see {yon(P200)}).
If deep binding is used, we mark the elements reachable
through %3env%1.
If shallow binding is used, then marking of %3oblist%1
will capture all the values; even the ones which may not be accessible
as λ-values. What we should do instead is mark the non-value properties
on atoms using %3oblist%1; we then mark
the values separately using the current %3env%1 skeleton tree.
Once all the active structure has been
marked, we proceed to the %2⊗→sweep phase↔←.%*
When the marking has been completed, we go linearly (sweep) through memory,
collecting all those cells which have not been marked. Again,
the assumption is that if cells are not marked in the first phase,
then they do not contain information necessary to the LISP computation.
These unmarked cells are chained together via their %3cdr%*-parts to form
a new FS list. The FS pointer is set to the beginning of this list.
The unmarked cells in FWS comprise the new FWS list.
If there is sufficient room in a full word to contain a pointer, then we
chain the words together; otherwise we must designate the FWS list
some other way.
.CENT(A simple LISP garbage collector written in LISP)
We will now write a garbage collector in LISP to mark and sweep nodes
in FS and FWS.
More complex algorithms will be discussed in {yonss(P27)} and on {yon(P144)}.
The present algorithm will have three main functions:
.BEGIN INDENT 0,10;
%3initialize[x;y]%* to initialize the marking device for each cell in the space
between %3x%* and %3y%*.
%3mark[l]%* to do the actual marking of a list %3l%*.
%3sweep[x;y]%* to collect all inaccessible cells in the space delimited
by %3x%* and %3y%*.
.END
%3initialize%* will be called twice; once for FS and once for FWS.
%3mark%* will be called for each base register which points to
active list structure.
The basic structure of %3mark%1 is: if the word is in FWS mark it and return;
if the word has already been marked simply return
since we are assured that any cells further down the structure
have already been marked. Otherwise the word is
in FS and thus has a %3car%* and a %3cdr%*; mark the word;
recursively mark the %3car%*;
recursively mark the %3cdr%*.
%3sweep%* will be called twice; once to generate a new free space list and once
to generate a new full word space list. Elements of these free
lists will be chained together by their %3cdr%* parts.
The initialization and sweep phases of this garbage collector are very similar;
both phases must be assured of reaching every node in the space.
These main functions use several other functions and predicates:
.BEGIN INDENT 0,10;
%3fwswrdp[x]%* is true just in the case that %3x%* is a word in FWS. This is
used as one of the termination conditions of %3mark%*.
%3markA[x]%* marks word %3x%* as accessible.
%3markNA[x]%* marks word %3x%* as not accessible.
%3Ap[x]%* is true if word %3x%* is marked "accessible".
%3up[x]%*: If %3x%* is at location %bn%* then %3up[x]%* is location %bn+1%*.
%3rplacd[x;y]%* modifies %3x%* by replacing its %3cdr%1-part with %3y%*.
The value returned is the new %3x%1.
.END
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
∞3dest∞* ∞3env∞*
~ ~
~ ⊂ααααααα⊃ ~ ⊂ααααααα⊃
%→ ~ #αβα⊃ %→ ~ #αβαα ### →
εαααπαααλ ↓ εαααπαααλ
# # # ←$ ~ x ~ # βα→α⊃
~ ~ β→ - - →⊃ εαααβαααλ ~
εαααβαααλ ~ y ~ #αβ→⊃ ↓
~ ~ ~ ↓ %ααα∀ααα$ ↓ ~
# # # ⊂αααπααα⊃ ⊂ααααααα⊃ ~ ↓
εαααβαααλ ~ ? ~ # β→ - - - →~ ? ~←$ ~
~ ~ ~ %ααα∀ααα$ %ααααααα$ ~
%ααα∀ααα$ ↑ ↓
%←ααααα←αααα←ααααα←ααααα←ααα$
.BEGIN CENTER;SELECT 2;
Algorithm for ∞3rplacd∞1
.END
.END
Can you write %3rplacd%* as a LISP function?
.BEGIN SELECT 3;TABIT2(15,17);TURN OFF "←";GROUP
initialize <= λ[[x;y] prog[[]
\ a\markNA[x];
\\x ← up[x];
\\[eq[x;y] → return[%et%3]];
\\go [a]]]
.END
.BEGIN SELECT 3;TABIT2(15,20);TURN OFF "←";GROUP
mark <= λ[[l] [\Ap[l] → %et%3;
\fwswrdp[l] → markA[l];
\%et%* → markA[l]; mark[car[l]]; mark[cdr[l]] ]]
.END
.BEGIN SELECT 3;TABIT2(13,15);TURN OFF "←";GROUP
sweep <= λ[[x;y] prog[[z]
\\z ← ();
\ a\[not[Ap[x]] → z ← rplacd[ x;z]];
\\x ← up[x];
\\[eq[x;y] → return [z]];
\\go[a]]]
.END
As indicated above, there are alternatives to garbage collection.
If the data-structure manipulations are particularly simple
then leaving storage management in the programmer's hands might
suffice. However, LISP generates very intertwined structures.
.P145:
Perhaps there is
another way to allow the system to handle storage management.
First notice that storage management becomes quite simple if there is no sharing
of sublists. The question of when to return a list to the free space list
is easily solved. However it is desirable to share substructure quite often.
Sharing can obviously save space and careful modification of shared structures
can communicate global information between algorithms.
Instead of using a garbage collector, we might associate a counter, called
a %2⊗→reference counter↔←%*, with each
list when it is built. In that counter we will store the number of references
to that list. Thus the counter will be initialized to 1 when the list is created.
Whenever the list
is shared we increase the counter by 1; whenever the list is no longer to
be shared by some list structure, we decrease the counter by 1.
When the count goes to 0 we can put the cells of the list in the free space list.
A difficulty with the reference counter scheme is the
inability to collect
circular lists. Consider the following sequence:
.BEGIN NOFILL;GROUP;TURN OFF"←";
%21.%* Manufacture a list, %3x%*: %3x ← (B I G M O T H E R L I S T)%1. Reference count is 1.
%22.%* Circularize it: %3x ← circle[x];%*. Reference count is now 2.
%23.%* Delete all references to %3x%*: %3x ← NIL%*. Reference count reverts to 1.
.END
The list is no longer referenced,
but it is not on the free space list, and has thus
been lost to the system.
Two less serious considerations should be mentioned in conjunction with
reference counters. First, each node which is to be collected with this scheme
must have an associated reference field to contain the count. That requires
extra space, and usually imposes a maximum size for the reference count.
If that maximum is reached,
either an additional space is allocated, or
the filled count may never be decremented and the associated
structure must be garbage collected.
The second problem involves decrementing counts. Whenever a count goes to zero
the counts associated with its immediate predecessors must also be decremented.
This process is applied recursively until non-zero counts are encountered.
The bookeeping for such a task is non-trivial.
However there are significant storage management problems which
are amenable to reference counting. Parts of the implementation of LISP
systems can profitably use reference counting. We will discuss some of
these aspects in {yonss(P228)}.
.CENT(Problems)
.P154:
I This problem deals with what is known in LISP as %2⊗→hash consing↔←%*
(⊗↑[Got#74]↑).
We have been storing atoms uniquely, but it should be clear from the behavior
of %3cons%* that non-atomic S-exprs are %2not%* stored uniquely.
Certainly storing single copies of any S-expr would save space. For example,
the non-atomic structure of
%3((A . B).(A . B))%* could be represented with two cells rather than three.
Unique storage is not without its difficulties. What problems do you foresee
in implementing such a scheme?
II We said on {yon(P48)} that many LISP computations generate list structure
rather than true LISP-trees. Give an example.
.P55:
III Can you write a LISP function %3circle <= λ[[x] ...]%* which will take
a list %3x%* and make it circular. Thus:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂ααααααααααααα←ααααααααααααα⊃
⊂αααααααααπααα⊃ ⊂αααααπααα⊃ ↓ ⊂ααααπααα⊃ ⊂αααααααπααα⊃ ↑
~ NOTHING ~ #αβα→~ CAN ~ #αβα∀→~ GO ~ #αβα→~ WRONG ~ #αβ→$
%ααααααααα∀ααα$ %ααααα∀ααα$ %αααα∀ααα$ %ααααααα∀ααα$
.END
This list is circular on its "last two" elements.
Printing such structures is not possible using the %3print%1 function.
IV What LISP operations generate such intertwined structures that a reference
counter implementation would be infeasible?
.GROUP;
.SS(A review of the structure of the LISP machine,LISP machine)
We have a good portion of the storage conventions for LISP set out.
A difficult area involves the organization of the data structures
to perform the correct binding and unbinding of variables. Before
we tackle that, we give a diagram showing the basic structure of LISP
memory.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
.TABS 50;
⊂αααααααααααααααααααααααααααααααααααααααα⊃
~?~
~ ### THE SUBCONSCIOUS ###?~
~ ∞3eval∞* and friends?~
~ ∞3read∞* and ∞3print∞*?~
~ the garbage collector?~
~ the base registers for marking;?~
~ these include: ?~
~ FS pointer?~
~ FWS pointer?~
~ atom table(∞3OBLIST∞*) pointer?~
~ registers for partial results?~
~ ∞3dest, control∞*?~
~ the access chain?~
~?~
εααααααααααααααααααααααααααααααααααααααααλ
~?~
~ ### POINTER SPACE ###?~
~ the free space list?~
~ those parts of S-exprs containing?~
~ ∞3car∞*- and ∞3cdr∞*-parts.?~
~?~
εααααααααααααααααααααααααααααααααααααααααλ
~?~
~ ### FULL WORD SPACE ###?~
~ the full word space list?~
~ atom print names?~
~ numbers?~
~?~
%αααααααααααααααααααααααααααααααααααααααα$
.END
.CENT(Structure of LISP memory)
.APART
.SS(Implementations of binding,,P78:)
In {yonss(P6)} and {yonss(P134)}we discussed deep binding and shallow binding
respectively. That discussion took place at a reasonably abstract level.
We would like to discuss these binding implementations in more detail.
We first will discuss some of the possible pitfalls in the implementation of LISP;
then we will give deep and shallow implementations for an important subset of LISP.
Finally, we will discuss some of the methods available for implementing the full
language in an efficient manner.
Though much of this discussion deals with the binding stategies of LISP,
and therefore with control structure, we are restricting ourselves to the
data structure requirements. The next chapter discusses how these data structures
are to be manipulated.
.GROUP;
Consider the evaluation of a form:
.BEGIN CENTERIT;SELECT 3;
←f[a%41%*; ...;a%4n%*]%1 where:
%3←f <= λ[[x%41%*; ... x%4n%*] ...g[ ...] ...i[ ...]],
←g <= λ[[ ...] ...h[ ...] ], %1and %*
←i <= λ[[ ...] ...j[ ...] ]
.END
.APART
.GROUP;
Typically a picture like the following occurs, where the
instance of function name means a block of λ-bindings necessary to
begin evaluation of that function:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂ααα⊃ ⊂ααα⊃
~ h ~ ~ j ~
⊂ααα⊃ εαααλ ⊂ααα⊃ ⊂ααα⊃ εαααλ ...
~ g ~ ~ g ~ ~ g ~ ~ i ~ ~ i ~
⊂ααα⊃ εαααλ εαααλ εαααλ ⊂ααα⊃ εαααλ εαααλ ...
~ f ~ ~ f ~ ~ f ~ ~ f ~ ~ f ~ ~ f ~ ~ f ~
%ααα$ %ααα$ %ααα$ %ααα$ %ααα$ %ααα$ %ααα$
.END
.APART;
We build up a stack of λ-binding blocks as we continue to enter procedures, and
as we leave a procedure we remove that block of bindings from the stack.
When we wish to know the value of a variable we look down the stack for the
first occurrence of that variable; the associated binding is the desired value.
LISP allows functional arguments and functional values. These constructs
require modification of the behavior modeled in the simple blocks world.
When we recognize a functional argument, we remember the block which was currently
on the top of the stack. When we apply that functional argument, intervening
blocks will have been stacked; we change the environment such that variable lookup
takes place beginning at the saved block, rather than at the top of the stack.
However, if %3h%* say, generated a functional value which is to be applied
in the context of %3j%* then
we must retain those
values in the %3f-g-h%* stack in such a way that they may be used to restore
the enviroment when we desire to apply the functional value in the %3f-i-j%* stack.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂ααα⊃ ⊂ααα⊃
↑ ~ h ~ ~ j ~ ~
~ εαααλ εαααλ ↓
bind ~ ~ g ~ ~ i ~ ~ unbind
↑ εααα∀αα∀αααλ ↓
~ ~ f ~ ↓
%αααααααααα$
.END
We will discuss data structure requirements for
implementation of these three LISP subsets: simple function
application, functional arguments, and functional values.
The mechanism which we described in the inital blocks model
occurs quite naturally in computer science.
It is called a %2⊗→stack↔←%*⊗↓Stacks are closely related to
a theoretical device called push-down automaton. There, only the top element
of the stack is accessible. We take a more pragmatic position, allowing
access to elements within the stack, and indeed modification of
elements within the stack; but removal of elements
from within the stack is not allowed. To remove an element, we must
first remove the elements above it.←.
The important characteristics of stacks are that they are lists such that
additions and deletions can only be made at the front of the list.
What is of interest to us now is that stacks have a particularly
efficient implementation. Due to the very regular way in which stacks are
manipulated, the ⊗→linked allocation↔← implementation is not necessary.
Instead a stack can be implemented as:
.BEGIN INDENT 10,10;GROUP
%21.%* A sequence of contiguous locations
%22.%* A pointer initialized to point %2before%* the first of these locations.
.END
.BEGIN INDENT 10,10;GROUP
%23.%* An operation, typically called %2push%* which places a new
object in the stack. This can be accomplished by
adding 1 to the value of the stack pointer, and then
putting a representation
of the object in the cell currently referenced by the pointer.
%24.%* An operation called %2pop%* which gets the first value in the stack
and then decrements the pointer by 1.
%25.%* Though the abstract structure of a stack does not involve
limitations on the length of stack-space,
any representation should include techniques
for assuring that the stack pointer stays within its allotted space.
.END
Notice that the %3concat%* operation can be interpreted as %2pushing%1
and the %3rest%* operation as %2popping%1. Indeed our earlier manipulations
of symbol tables effectively used such stack operations.
This is particularly apparent in the representation
of symbol tables given on {yon(P43)}.
.GROUP;
We will discuss the binding process in terms of a
sequence of three events:
.BEGIN INDENT 0,5;
%21.%1. %3bind%1 describes what the implementation does
when we are ready to call a procedure. The actual parameters are evaluated
and we are ready to add them to the environment and evaluate the body of the
procedure.
%22.%1 %3lookup%1 will discuss how values are located in the current environment.
%23.%1 %3unbind%1 will describe what has to be done as we prepare to exit from
a procedure.
.END
.APART
.SS(stack implementation of deep binding,,P148:)
The stack implementation of simple function application is a
straightforward stack implementation of our blocks picture.
We have two stacks which operate synchronously. One stack is called
the %2⊗→name stack↔←%1; the other is called the %2⊗→value stack↔←%1.
The name stack maintains the λ-variables (and
generated names, if a primitive is called). The top of the name stack
defines the origin of the environment. When the value of a variable is
requested %3lookup%1 proceeds down the name stack, looking for the first
occurrence of the variable. The corresponding position in the value stack
contains the desired value.
When we recognize a function application, we begin the evaluation of the
actual parameters. As each parameter is evaluated, the result is pushed into the
value stack. When all the parameters have been evaluated, we are ready to evaluate
the body of the expression. At that point %3bind%1 pushes the λ-variables onto the name
stack.
When we complete the evaluation of the body of the expression
%3unbind%1 pops the λ-variables
off of the name stack, and pops their values off of the value stack.
.GROUP;
Since the λ-variables leave the stack as a group we can sometimes speed things up
by storing block sizes in the stack. Also the word size of the machine may
allow using one stack for both names and values. For example:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂ααααααααααααα⊃
~ #ααα→ααβ→⊃
εαααααααπαααααλ ↓
~ namen ~ valn~ ~
εαααααααβαααααλ ~
# # # ↓
~ name2 ~ val2~ ~
εαααααααβαααααλ ~
~ name1 ~ val1~ ↓
εααααααα∀αααααλ ~
~ ~←$
~ #ααα→ααβ→⊃
↓
.END
.APART
Where %bval1%1 is furthest down the stack since it was pushed on first,
and where all references to %bnamei%1 and %bvali%1 are really pointers to
to appropriate S-expressions (atoms or dotted pairs).
The "back pointer" is used for removing blocks of bindings, but it is also
a representation of the control link discussed in {yonss(P77)}.
A slight modification of this scheme is sufficient to support
evaluation of functional arguments. The additional piece of information
guides %3lookup%1 in its search for the next block of bindings.
Instead of proceeding linearly down the stack, %3lookup%1 proceeds linearly
through a block and at the end of each block is information telling %3lookup%1
where the next block of bindings is to be found. Recall that information
is called the access link ({yonss(P77)}).
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂ααααααααααααα⊃
⊂←β←αα# #αα→β→⊃
↓ εαααααααπαααααλ ↓
~ ~ namen ~ valn~ ~
~ εαααααααβαααααλ ~
access links ↓ # # # ↓ control links
~ ~ name2 ~ val2~ ~
~ εαααααααβαααααλ ~
↓ ~ name1 ~ val1~ ↓
~ εααααααα∀αααααλ ~
%→~ ~←$
⊂←β←αα# #αα→β→⊃
↓ ↓
.END
In the majority of cases, the access link points at the next block down the stack.
The use of functional arguments will generate an access link, pointing
further down the stack.
The %3function%1-construct
is responsible for saving the binding environment. In this implementation
this is accomplished by saving a pointer to the current top of the name
stack.
When a functional argument is applied, the access link will be set to
the binding environment ({YON(P215)}).
Since we are dealing with functional
arguments, we are assured that the application of that argument will occur
within an expression which dynamically surrounds the creation of the
functional argument. This means that our environment pointer will indeed be a
pointer down the stack. The use of functional values invalidates that assumption.
Before examining that problem we will discuss shallow binding for
the same LISP subsets.
.SS(stack implementation of shallow binding,,P225:)
A stack implementation of shallow binding allows some elegant economies.
Instead of having a set of %3VAR%1 entries associated
with each atom, we will have %2one%1 value#cell, and will maintain previous
bindings of the variable on a stack.
The %3VAR%1 property will be combined with the other value properties
which an atom can carry. Thus %2any and all%1 value properties will
be found through the value cell.
The implementation of %3lookup%1 will be
simple: take the value in the value cell. The implementation of %3mkenv%1
will require a bit more work than that expended in the deep binding case;
and the work required on leaving a procedure will be more substantial
than popping the name-value stack.
Since the the value cell will be maintained to %2always%1 contain the
proper binding of a variable, the distinction between local and
non-local variables goes away. The contents of the value cell is %2the%1
current value for any variable.
When we enter a procedure, we will
save the old contents of the value cell and replace it with the new
value. When we leave a procedure we will restore that saved value.
Since procedures can be called recursively we will need stack-like devices
for saving old values.
The justification for the implementation comes from an examination of the
shallow binding strategy as described in {yonss(P134)}.
There we had a collection of bindings associated with the identifier.
Without loss of generality,
we can organize %3mkenv%1 such that the new binding is added in front
of any previous binding.
Now if we are only
evaluating simple function applications, %3lookup%1 will find
the desired binding
provided %3unbind%1 removes the first binding as it exits the procedure.
Thus %3bind%1⊗↓Also known as %3send%1 on {yon(P218)}.← and %3unbind%1
act on the value entries in a stack-like fashion.
Instead of associating a separate stack with each variable and accessing
the value through the top of the stack, we associate a single value cell
with each variable and store the saved values of all variables
on a common stack.
Since we allow atoms to have at most one value property at any time,
we will combine all these values under the common indicator %3VALUE%1.
So the value cell is the property value of the property %3VALUE%1.
Also, we have a stack, called SP
in which we can
save old values of variables.
%3bind%* first saves both the value
and the location of the value#cell for each variable being rebound;
it then puts a special mark in the top of the SP stack to delimit
each block of λ-rebindings.
All %3unbind%* need do is restore the first block
of saved values. It knows the values and also knows the addresses of
the value#cells⊗↓An alternative implementation of %3unbind%1 would only
store the saved values, without explicitly saving the locations, and
without marking the stack. In this implementation %3unbind%1 would
take arguments describing which variables needed restoring.←.
All of the information it needs for restoration is saved in the stack.
This stack of previous values is also visited by the garbage collector;
it may be that the only copy of some value is accessible only through
the SP-stack. It would be most unfortunate if the garbage collector neglected
to mark that entry and the unbinding mechanism later tried to restore the value.
.GROUP
Here's a sample session with the binding mechanism:
Assume %3X%* and %3Y%* are currently bound to %3(A#.#1)%* and
%3(B#.#2)%* respectively; and assume we wish to evaluate:
.BEGIN CENTER;
%3((LAMBDA (X#Y) (%9x%*# X Y)) (QUOTE C) (QUOTE D))%1.
We assume SP is in some well-defined state:
.END
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
part of atom for X part of atom for Y
⊂αααααααπααα⊃ ⊂αααπαα ⊂αααααααπααα⊃ ⊂αααπαα
SP ~ ~ VALUE ~ #αβ→~ # ~ #→... ~ VALUE ~ #αβ→~ # ~ #→...
⊂αααα⊃ εααααααααααααλ %ααααααα∀ααα$ %αβα∀αα %ααααααα∀ααα$ %αβα∀αα
~ #αβα→~*MARK* #ααβ→⊃ ↓ ↓
%αααα$ εααααααααααααλ ↓ ⊂αααπααα⊃ ⊂αααπααα⊃
~ last entry ~ ~ ~ A ~ 1 ~ ~ B ~ 2 ~
εααααααααααααλ ~ %ααα∀ααα$ %ααα∀ααα$
...
↓
εααααααααααααλ←$
~*MARK* #ααβ→⊃
εααααααααααααλ ↓
...
.END
.APART
.GROUP
.BEGIN CENTER
Now we save the currrent value of %3X%* and the location of its value#cell:
.END
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααααααααα→αααααααα→αααααααααα→αααααααααα→α⊃
SP ~ ↑ ~ ~
⊂αααα⊃ εααβααπαααααλ ~
~ #ααβαα→~ # ~ #ααβ→ααα⊃ part of atom for X ↓
%αααα$ εααααα∀αααααλ ↓ ⊂αααααααπααα⊃ ⊂αααπαα
~*MARK* #ααβ→⊃ ~ ~ VALUE ~ #αβ→~ # ~ #→...
εαααααααααααλ ↓ ~ %ααααααα∀ααα$ %αβα∀αα
~last entry ~ . ↓ ↓
εαααααααααααλ . ~ ⊂αααπααα⊃
... . %αααααα→ααααααααα→αααααααα→~ A ~ 1 ~
%ααα∀ααα$
.END
.APART
A similar "push" saves the value of %3Y%*. Once this is done we are
free to bind %3X%* and %3Y%* to the new values.
.GROUP
.BEGIN CENTER
Here's what we have after rebinding %3X%* to %3C%* and %3Y%* to %3D%*:
.END
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααααααααα→αααααααα→αααααααααα→αααααααααα→α⊃
~ ~
SP ~ ↑ ↓
⊂αααα⊃ εααβααπαααααλ ~
~ #ααβαα→~ # ~ #ααβ→ααα⊃ part of atom for Y ↓
%αααα$ εαααααβαααααλ ~ ⊂αααααααπααα⊃ ⊂αααπαα
⊂α←ααβαα# ~ #ααβ→⊃ ~ ~ VALUE ~ #αβ→~ D ~ #→...
~ εααααα∀αααααλ ~ ~ %ααααααα∀ααα$ %ααα∀αα
↓ ~*MARK* #→ ↓ ↓
~ εαααααααααααλ ~ ~
~ ~ last entry~ ~ ~
~ εαααααααααααλ ↓ ~ ⊂αααπααα⊃
↓ ~ %ααααααααααα→ααααααααααααα→~ B ~ 2 ~
~ ⊂αααπααα⊃ ~ ⊂αααπααα⊃ %ααα∀ααα$
%α→~ C ~ #αβ→ ... %αα→ααα→~ A ~ 1 ~
%ααα∀ααα$ %ααα∀ααα$
part of atom X
.END
.APART
.GROUP
.BEGIN CENTER
Then we top off SP to acknowledge a block of saved bindings:
.END
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
SP ~ ~
⊂αααα⊃ εααααααααααααλ
~ #αβα→~*MARK* #ααβ→⊃
%αααα$ εααααααααααααλ ~
~ saved Y ~ ↓
εααααααααααααλ ~
~ saved X ~ ↓
εααααααααααααλ←$
~*MARK* #ααβ→⊃
εααααααααααααλ ↓
...
.END
.APART
Notice that %3X%* and %3Y%* have values %3C%* and %3D%* respectively
now and the previous bindings are saved on
the SP-stack. We may now begin the evaluation of the expression %3(%9x%*#X#Y)%1
assured that we will get the expected values for %3X%* and %3Y%* and
assured that we will be able to restore the previous values afterwards.
%3unbind%* simply "pops" entries off of the top of SP, using
the information stored there to restore the old values; %3unbind%*
stops as soon as it has popped the first block.
This implementation works quite well for
simple λ-binding and lookup. Changing environments is a bit of work, but
the access to the values of variables is relatively rapid, particularly
if we make sure that the value cell is always stored at a known position
relative to the beginning of the property list.
In describing this implementation we have used more representation than
might seem appropriate. In particular the representation of the value cell
as a linked list seems unnecessarily explicit. This was done to illuminate
the pointer modifications involved in binding and unbinding.
How can we
implement the equivalent of %3FUNARG%* in this new binding
organization? Recall how deep binding coped.
When we recognized an instance of a functional argument we saved a pointer to
the current environment. When we %2applied%*
the functional argument we restored the symbol table in such a way
that global variables were accessed in the saved environment while
local variables were found in the current environment.
We must try to do the same with the shallow-binding.
The action taken when a functional argument is recognized is quite similar
to our previous solution:
when %3function%* is seen save the current SP pointer, rather than
the current environment name. This action manufactures a triple
%3(FUNARG#%*<function>#<old SP>%*)%1. However the action required when we
wish to apply the functional argument is much more complicated. Before,
we just set up a new access chain such that the local table referred to
the environment saved by the %3FUNARG%* construction whenever a global
value was needed.
The main problem with the shallow-binder is that the Special Pushdown
List only reflects the %2incremental%* changes in the environments
during the computation. To retrieve the environment current when the
functional argument was bound, we must unwind SP back to the state
which was then operative; we must also save the %2current%* state
so that we may return to %2it%* when finished with the functional argument.
Now for more detail:
when we apply the functional argument,
.BEGIN centerit;
%3←(FUNARG %*<function> <old SP>%*) %1to arg%41%*; ... arg%4n%3 ,
.END
we will first rebind all of the variables found by the old SP pointer to
those old values, while saving the current values in SP. Then we can rebind the
λ-variables (i.e., local variables) of the <function> to arg%41%* through
arg%4n%*. The environment will then be properly restored for evaluation of the
function body. When the evaluation has been completed we restore using %3unbind%*.
This process is complex enough to warrant an example:
.CENT(An example of shallow binding and %3FUNARG%*)
As an example of the complications involved in handling functional arguments
in shallow binding we offer the following:
.BEGIN INDENT 0,5;GROUP
%2I%* Assume that %3x%* initially has value %31%* and the SP pointer
is at location %bSP1%*,
%2II%* then assume that a λ-binding rebinds %3x%* to %32%*;
%2III%* in this new context, assume a functional argument, %3g%*,
is to be bound to a function-variable %3f%*.
%2IV-V%* As the computation continues %3x%* is rebound first to %33%* and
within %2that%* context rebound again to %34%*.
%2VI%* Finally %3f%* is applied; this will resurrect %3g%*⊗↓From the value#cell
of %3f%* as %3(FUNARG G SP2)%*.← requiring
a restoration of the environment current at step %2III%*.
.END
.GROUP
Steps %2I%* through %2V%* would lead to the following sequence.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
F: (FUNARG G #)
X: 1 X: 2 X: 2 ↓
εααα∀αααλ εαααβαααλ ⊂ααα←ααα$
SP1 αα→~*MARK* ~ =II=> SP2 αα→~ X ~ 1 ~ =III=> ↓
εαααπαααλ αααα∀αααλ SP2 ∀α→~ X ~ 1 ~
... ~*MARK* ~ ...
εαααπαααλ
...
SP1 αα→~ ... ~
X: 3 X: 4
=IV=> SP3 αα→~ X ~ 2 ~ =V=> SP4 εαααβαααλ
εαααβαααλ %α→~ X ~ 3 ~
. . . εαααβαααλ
SP2 αα→~ X ~ 1 ~ . . .
εαααβαααλ
...
.END
.APART
Now to apply the functional argument: %3(FUNARG G SP2)%*.
This is accomplished by tracing down the SP stack with a pointer %bSP*%*,
moving from %bSP4%* --the#current#stack#pointer-- down to %bSP2%*
--the#%3FUNARG%*#pointer--, reversing all the intervening bindings on SP
and putting the saved values back into the value#cell.
The pattern of these reversals must be saved; we do this by adding a new
block above %bSP4%*.
.GROUP
Thus, steps %2VII%* and %2VIII%*:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
X: 2
εαααβαααλ
X: 3 SP6 α→~ X ~ 3 ~
...
εαααβαααλ εαααβαααλ
SP5 α→~ X ~ 4 ~ SP5 α→~ X ~ 4 ~
εαααβαααλ εαααβαααλ
... ...
SP4 α→~ X ~ 3 ~ SP4 α→~ X ~ 3 ~
εαααβαααλ εαααβαααλ
... ...
=VII=> SP3 α→~ X ~ 2 ~ ←αα SP* =VIII=> SP3 α→~ X ~ 2 ~
... ...
εαααβαααλ εαααβαααλ
SP2 α→~ X ~ 1 ~ SP2 α→~ X ~ 1 ~ ←αα SP*
... . . .
.END
.APART
Now we are in a position to evaluate the call on %3g%*; when we are finished
with %3g%* we will use the unbinding mechanism to reinstate the world
as it existed at %bSP4%*. This process will restore the value#cells using
the areas of the stack between %bSP6%* and %bSP4%*.
This implementation of
shallow binding obviates the symbol table search while incurring added
expense in binding and unbinding.
Functional arguments are more difficult since there
is only one symbol table, not the stack of tables implicit in the
previous incarnation. True, the information necessary to recover an environment
is present in SP, but it is expensive to retrieve it.
Though the stack implementation of shallow binding
%2will%* perform for functional arguments,
it will involve even more complexity if we wish to handle functional values.
The difficulty is the same as that for the stack implementation of
deep binding: the %3FUNARG%* will point
"up" the SP stack rather than "down".
A straightforward application of the technique used for functional arguments
will not work. At the time we wished to apply the functional value
its saved SP-pointer will be pointing into a section of the SP stack
which no longer reflects the proper state.
For when we leave the environment which created the
functional value the current unbinding mechanism will cut the stack back to
the point which existed when we entered that environment.
A difficulty arises even in the case of functional arguments:
since a %2copy%* of the binding environment is being generated in the unwinding
process, changes in %2that%* environment will not be reflected when we restore
the activation environment.
The real difficulty is in attempting to
represent the tree-like structure of environments by using the essentially
linear representation of a stack.
.ss(Strategies for LISP implementation,,P228:)
ref count, B-W, W-C.M., RG
The discussion of the last three sections should be related to the earlier
discussion of binding strategies, Weizenbaum environments, functional arguments
and the non-recursive evaluators.
Our mapping of the binding implementations (shallow or deep) onto a stack
is sufficient for the great majority of current LISP programs. However as
the LISP community explores new ways of using the language, they have begun to
expect that the full power of functional values be available; and have
proposed extensions in the LISP control structure to allow even non-recursive
control. Since these topics are of current research interest it is not
clear how lasting their impact will be. We will sketch a few of the
ideas involved and indicate how the techniques we have dsicussed in this chapter
can be applied.
We begin with a review of the initial evaluator of {yonss(P6)}. In
that implementation we kept the
symbol tables available by representing them as list-structure and therefore
they became a garbage collectable commodity. As long as a %3FUNARG%* construct
accessed a table then the table was retained. Essentially we had removed
the stack-like behavior of symbol-table accesses which occurs most of the
time and replaced it with a general scheme which
works for all cases but incurs a significant overhead in even the most
simple of function calls.
We would like an intermediate solution:
one which works for all cases and minimizes overhead in the
typical call. Such a scheme can indeed be implemented.
Recall our discussion of garbage collection in {yonss(P146)}. There we said
that a garbage collector was used in LISP since the interrelationships
which we generated in the data structure manipulations were
sufficiently intertwined that it was not possible to use less
sophisticated methods to determine whether a structure was still active.
Now local symbol tables %2are%* data structures; the discussion of
Weizenbaum environments in {yonss(P77)} should have convinced you of that.
They are chained
together in a manner reminiscent of that of our implementation
of S-exprs; indeed as we have just mentioned LISP's attitude toward
management of such tables was to garbage collect them. However the %2behavior%* of
tables during the execution of a program is much less complex than that
of arbitrary list structure. Again as we have just seen the behavior
is quite rational and predictable except for procedure-valued variables.
A solution giving a reasonable implementation based on
the alternative storage management scheme of ⊗→reference counter↔←s, which we
described on {yon(P145)}, is described in a paper by D. Bobrow and B. Wegbreit.
The effect of their representation of the control for LISP is that
little overhead is extracted if the program does not use the more
exotic features: a stack-like device results. A larger toll is
paid for use of more general control regimes.
**loss on complier optimization***
*** W CM and RG on shallow***
It %2is%* possible to consider combining the use of the value cell
with either
shallow and deep strategies.
We have %2both%1 a value cell and either
name-value trees or shallow bound p-lists. We will try to use
the value#cell as much as possible.
We associate an extra piece of information with each value attached to
any value#cell, telling the binding time of that variable.
We have a global indicator telling the current context which we are using:
E%4current%*, say. When we want the value of a variable, we first go to the
augmented value#cell; if the binding time indication is that of
E%4current%1, then the value is correct and we take it. If the
indicators disagree, we use the full %3lookup%1 strategy;
when we find the variable we update the value#cell and change the binding
time indicator to E%4current%*. This way we only search the stacks for the
first access to a variable; after that we can justifiably use the value cell until
we change context again.
.SS(Epilogue)
←%2Subtitled: But my Fortran program doesn't do all this crap!%1
It is quite true that a running Fortran, PL/1, or Algol program
is far less complicated as far as its symbol accessing mechanisms
are concerned. In Fortran there is a simple relationship between
variables and memory locations which will contain their values;
a Fortran evaluator can assign fixed locations to the variables
in a program.
In Algol, there is a simple relationship between variables and positions
in the run-time stack; an Algol evaluator cannot assign fixed locations, but
it can replace the variable lookup with a simple address calculation. This
is partly due to Algol's use of static binding and partly due to its
restrictions on procedure-valued variables.
In LISP, both the quality and the quantity of variables can change.
Arbitrary properties can be associated with atoms at run-time.
Indeed, the symbol table mechanism of LISP is more reminiscent of
that associated with the Fortran or Algol compiler. For these
languages it is the compiler which performs the mapping
from source language to running machine code. It is the compiler's
responsibility to discover the properties associated with each variable.
The compiler can do this because the semantics of the language is such
that at compile time all, or almost all, of the properties
of the variables are known.
This is not true for LISP. In general you cannot tell until run time what
the attributes of a particular atom are. The situation is really even worse
than this. Since programs and data are indistinguishable, we can construct a
list using the data structure facilities and the turn right around and evaluate
that list as a representation of a LISP expression.
However,
a large majority of LISP computations fall into a much more
disciplined set, and for those computations, it is reasonable to think
about applying some of the ideas available for Fortran or Algol
translators. That is we don't use all of the generality available in the language
and we can therefore reduce some of the variable look up, for example.
For these kinds of computations, it might be appropriate to compile out
the uneeded generality.
There are LISP compilers, typically written
in LISP. They can make many decisions at compile time
about the properties of variables, but in general the compiled code
will be interspersed with calls on %3eval%*
because the compiler just doesn't know what to do.
This implies that compiled and interpreted code must be able to
communicate with each other. A piece of compiled code can call a
λ-expression or conversely. The execution
of the program should be totally transparent as to whether any, or all, or
none of the functions are compiled. This means that the calling
sequences for both kinds of functions must be compatible. Less
obvious and by far more troublesome, is the communication of the values of free
variables. The next chapter of the text discusses
the run-time behavior required for implementations of LISP-like languages
and will include a discussion of LISP compilers.
.END "JRA";